Thread: WIP: Data at rest encryption

WIP: Data at rest encryption

From
Ants Aasma
Date:
Hi all,

I have been working on data-at-rest encryption support for PostgreSQL.
In my experience this is a common request that customers make. The
short of the feature is that all PostgreSQL data files are encrypted
with a single master key and are decrypted when read from the OS. It
does not provide column level encryption which is an almost orthogonal
feature, arguably better done client side.

Similar things can be achieved with filesystem level encryption.
However this is not always optimal for various reasons. One of the
better reasons is the desire for HSM based encryption in a storage
area network based setup.

Attached to this mail is a work in progress patch that adds an
extensible encryption mechanism. There are some loose ends left to tie
up, but the general concept and architecture is at a point where it's
ready for some feedback, fresh ideas and bikeshedding.

Usage
=====

Set up database like so:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     initdb -k -K pgcrypto $PGDATA )

Start PostgreSQL:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     postgres $PGDATA )

Design
======

The patch adds a new GUC called encryption_library, when specified the
named library is loaded before shared_preload_libraries and is
expected to register its encryption routines. For now the API is
pretty narrow, one parameterless function that lets the extension do
key setup on its own terms, and two functions for
encrypting/decrypting an arbitrary sized block of data with tweak. The
tweak should alter the encryption function so that identical block
contents are encrypted differently based on their location. The GUC
needs to be set at bootstrap time, so it gets set by a new option for
initdb. During bootstrap an encryption sample gets stored in the
control file, enabling useful error messages.

The library name is not stored in controldata. I'm not quite sure
about this decision. On one hand it would be very useful to tell the
user what he needs to get at his data if the configuration somehow
goes missing and it would get rid of the extra GUC. On the other hand
I don't really want to bloat control data, and the same encryption
algorithm could be provided by different implementations.

For now the encryption is done for everything that goes through md,
xlog and slru. Based on a review of read/write/fread/fwrite calls this
list is missing:

* BufFile - needs refactoring
* Logical reorder buffer serialization - probably needs a stream mode
cipher API addition.
* logical_heap_rewrite - can be encrypted as one big block
* 2PC state data - ditto
* pg_stat_statements - query texts get appended so a stream mode
cipher might be needed here too.

copydir needed some changes too because tablespace and database oid
are included in the tweak and so copying also needs to decrypt and
encrypt with the new tweak value.

For demonstration purposes I imported Brian Gladman's AES-128-XTS mode
implementation into pgcrypto and used an environment variable for key
setup. This part is not really in any reviewable state, the XTS code
needs heavy cleanup to bring it up to PostgreSQL coding standards,
keysetup needs something secure, like PBKDF2 or scrypt.

Performance with current AES implementation is not great, but not
horrible either, I'm seeing around 2x slowdown for larger than
shared_buffers, smaller than free memory workloads. However the plan
is to fix this - I have a prototype AES-NI implementation that does
3GB/s per core on my Haswell based laptop (1.25 B/cycle).

Open questions
==============

The main questions is what to do about BufFile? It currently provides
both unaligned random access and a block based interface. I wonder if
it would be a good idea to refactor it to be fully block based under
the covers.

I would also like to incorporate some database identifier as a salt in
key setup. However, system identifier stored in control file doesn't
fit this role well. It gets initialized somewhat too late in the
bootstrap process, and more importantly, gets changed on pg_upgrade.
This will make link mode upgrades impossible, which seems like a no
go. I'm torn whether to add a new value for this purpose (perhaps
stored outside the control file) or allow setting of system identifier
via initdb. The first seems like a better idea, the file could double
as a place to store additional encryption parameters, like key length
or different cipher primitive.

Regards,
Ants Aasma

Attachment

Re: WIP: Data at rest encryption

From
Haribabu Kommi
Date:
On Tue, Jun 7, 2016 at 11:56 PM, Ants Aasma <ants.aasma@gmail.com> wrote:
> Hi all,
>
> I have been working on data-at-rest encryption support for PostgreSQL.
> In my experience this is a common request that customers make. The
> short of the feature is that all PostgreSQL data files are encrypted
> with a single master key and are decrypted when read from the OS. It
> does not provide column level encryption which is an almost orthogonal
> feature, arguably better done client side.
>
> Similar things can be achieved with filesystem level encryption.
> However this is not always optimal for various reasons. One of the
> better reasons is the desire for HSM based encryption in a storage
> area network based setup.
>
> Attached to this mail is a work in progress patch that adds an
> extensible encryption mechanism. There are some loose ends left to tie
> up, but the general concept and architecture is at a point where it's
> ready for some feedback, fresh ideas and bikeshedding.

Yes, encryption is really a nice and wanted feature.
Following are my thoughts regarding the approach.

1. Instead of doing the entire database files encryption, how about
providing user an option to protect only some particular tables that
wants the encryption at table/tablespace level. This not only provides
an option to the user, it reduces the performance impact on tables
that doesn't need any encryption. The problem with this approach
is that every xlog record needs to validate to handle the encryption
/decryption, instead of at page level.

2. Instead of depending on a contrib module for the encryption, how
about integrating pgcrypto contrib in to the core and add that as a
default encryption method. And also provide an option to the user
to use a different encryption methods if needs.

3. Currently entire xlog pages are encrypted and stored in the file.
can pg_xlogdump works with those files?

4. For logical decoding, how about the adding a decoding behavior
based on the module to decide whether data to be encrypted/decrypted.

5. Instead of providing passphrase through environmental variable,
better to provide some options to pg_ctl etc.

6. I don't have any idea whether is it possible to integrate the checksum
and encryption in a single shot to avoid performance penalty.


> I would also like to incorporate some database identifier as a salt in
> key setup. However, system identifier stored in control file doesn't
> fit this role well. It gets initialized somewhat too late in the
> bootstrap process, and more importantly, gets changed on pg_upgrade.
> This will make link mode upgrades impossible, which seems like a no
> go. I'm torn whether to add a new value for this purpose (perhaps
> stored outside the control file) or allow setting of system identifier
> via initdb. The first seems like a better idea, the file could double
> as a place to store additional encryption parameters, like key length
> or different cipher primitive.

I feel separate file is better to include the key data instead of pg_control
file.


Regards,
Hari Babu
Fujitsu Australia



Re: WIP: Data at rest encryption

From
Ants Aasma
Date:
On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> 1. Instead of doing the entire database files encryption, how about
> providing user an option to protect only some particular tables that
> wants the encryption at table/tablespace level. This not only provides
> an option to the user, it reduces the performance impact on tables
> that doesn't need any encryption. The problem with this approach
> is that every xlog record needs to validate to handle the encryption
> /decryption, instead of at page level.

Is there a real need for this? The customers I have talked to want to
encrypt the whole database and my goal is to make the feature fast
enough to make that feasible for pretty much everyone. I guess
switching encryption off per table would be feasible, but the key
setup would still need to be done at server startup. Per record
encryption would result in some additional information leakage though.
Overall I thought it would not be worth it, but I'm willing to have my
mind changed on this.

> 2. Instead of depending on a contrib module for the encryption, how
> about integrating pgcrypto contrib in to the core and add that as a
> default encryption method. And also provide an option to the user
> to use a different encryption methods if needs.

Technically that would be simple enough, this is more of a policy
decision. I think having builtin encryption provided by pgcrypto is
completely fine. If a consensus emerges that it needs to be
integrated, it would need to be a separate patch anyway.

> 3. Currently entire xlog pages are encrypted and stored in the file.
> can pg_xlogdump works with those files?

Technically yes, with the patch as it stands, no. Added this to my todo list.

> 4. For logical decoding, how about the adding a decoding behavior
> based on the module to decide whether data to be encrypted/decrypted.

The data to be encrypted does not depend on the module used, so I
don't think it should be module controlled. The reorder buffer
contains pretty much the same stuff as the xlog, so not encrypting it
does not look like a valid choice. For logical heap rewrites it could
be argued that nothing useful is leaked in most cases, but encrypting
it is not hard. Just a small matter of programming.

> 5. Instead of providing passphrase through environmental variable,
> better to provide some options to pg_ctl etc.

That looks like it would be worse from a security perspective.
Integrating a passphrase prompt would be an option, but a way for
scripts to provide passphrases would still be needed.

> 6. I don't have any idea whether is it possible to integrate the checksum
> and encryption in a single shot to avoid performance penalty.

Currently no, the checksum gets stored in the page header and for any
decent cipher mode the encryption of the rest of the page will depend
on it. However, the performance difference should be negligible
because both algorithms are compute bound for cached data. The data is
very likely to be completely in L1 cache as the operations are done in
quick succession.

The non-cryptographic checksum algorithm could actually be an attack
vector for an adversary that can trigger repeated encryption by
tweaking a couple of bytes at the end of the page to see when the
checksum matches and try to infer the data from that. Similarly to the
CRIME attack. However the LSN stored at the beginning of the page
header basically provides a nonce that makes this impossible.

This also means that encryption needs to imply wal_log_hints. Will
include this in the next version of the patch.

>> I would also like to incorporate some database identifier as a salt in
>> key setup. However, system identifier stored in control file doesn't
>> fit this role well. It gets initialized somewhat too late in the
>> bootstrap process, and more importantly, gets changed on pg_upgrade.
>> This will make link mode upgrades impossible, which seems like a no
>> go. I'm torn whether to add a new value for this purpose (perhaps
>> stored outside the control file) or allow setting of system identifier
>> via initdb. The first seems like a better idea, the file could double
>> as a place to store additional encryption parameters, like key length
>> or different cipher primitive.
>
> I feel separate file is better to include the key data instead of pg_control
> file.

I guess that would be more flexible. However I think at least the fact
that the database is encrypted should remain in the control file to
provide useful error messages for faulty backup procedures.

Thanks for your input.

Regards,
Ants Aasma



Re: WIP: Data at rest encryption

From
Michael Paquier
Date:
On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote:
>> I feel separate file is better to include the key data instead of pg_control
>> file.
>
> I guess that would be more flexible. However I think at least the fact
> that the database is encrypted should remain in the control file to
> provide useful error messages for faulty backup procedures.

Another possibility could be always to do some encryption at data-type
level for text data. For example I recalled the following thing while
going through this thread:
https://github.com/nec-postgres/tdeforpg
Though I don't quite understand the use for encrypt.enable in this
code... This has the advantage to not patch upstream.
-- 
Michael



Re: WIP: Data at rest encryption

From
Ants Aasma
Date:
On Mon, Jun 13, 2016 at 5:17 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote:
>>> I feel separate file is better to include the key data instead of pg_control
>>> file.
>>
>> I guess that would be more flexible. However I think at least the fact
>> that the database is encrypted should remain in the control file to
>> provide useful error messages for faulty backup procedures.
>
> Another possibility could be always to do some encryption at data-type
> level for text data. For example I recalled the following thing while
> going through this thread:
> https://github.com/nec-postgres/tdeforpg
> Though I don't quite understand the use for encrypt.enable in this
> code... This has the advantage to not patch upstream.

While certainly possible, this does not cover the requirements I want
to satisfy - user data never gets stored on disk unencrypted without
making changes to the application or schema. This seems to be mostly
about separating administrator roles, specifically that centralised
storage and backup administrators should not have access to database
contents. I see this as orthogonal to per column encryption, which in
my opinion is better done in the application.

Regards,
Ants Aasma



Re: WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/7/16 9:56 AM, Ants Aasma wrote:
> Similar things can be achieved with filesystem level encryption.
> However this is not always optimal for various reasons. One of the
> better reasons is the desire for HSM based encryption in a storage
> area network based setup.

Could you explain this in more detail?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/12/16 3:13 AM, Ants Aasma wrote:
>> 5. Instead of providing passphrase through environmental variable,
>> > better to provide some options to pg_ctl etc.
> That looks like it would be worse from a security perspective.
> Integrating a passphrase prompt would be an option, but a way for
> scripts to provide passphrases would still be needed.

Environment variables and command-line options are visible to other 
processes on the machine, so neither of these approaches is really going 
to work.  We would need some kind of integration with secure 
password-entry mechanisms, such as pinentry.

Also note that all tools that work directly on the data directory would 
need password-entry and encryption/decryption support, including 
pg_basebackup, pg_controldata, pg_ctl, pg_receivexlog, pg_resetxlog, 
pg_rewind, pg_upgrade, pg_xlogdump.

It seems that your implementation doesn't encrypt pg_control, thus 
avoiding some of that.  But that doesn't seem right.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Data at rest encryption

From
Haribabu Kommi
Date:
On Sun, Jun 12, 2016 at 5:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote:
> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>
>> 2. Instead of depending on a contrib module for the encryption, how
>> about integrating pgcrypto contrib in to the core and add that as a
>> default encryption method. And also provide an option to the user
>> to use a different encryption methods if needs.
>
> Technically that would be simple enough, this is more of a policy
> decision. I think having builtin encryption provided by pgcrypto is
> completely fine. If a consensus emerges that it needs to be
> integrated, it would need to be a separate patch anyway.

In our proprietary database, we are using the encryption methods
provided by openSSL [1]. May be we can have a look at those
methods provided by openSSL for the use of encryption for builds
under USE_SSL. Ignore it if you have already validated.


>> 5. Instead of providing passphrase through environmental variable,
>> better to provide some options to pg_ctl etc.
>
> That looks like it would be worse from a security perspective.
> Integrating a passphrase prompt would be an option, but a way for
> scripts to provide passphrases would still be needed.

What I felt was, if we store the passphrase in an environmental variable,
a person who is having an access to the system can get the details
and using that it may be possible to decrypt the data files.


[1] - https://www.openssl.org/docs/manmaster/crypto/EVP_EncryptInit.html


Regards,
Hari Babu
Fujitsu Australia



Re: WIP: Data at rest encryption

From
Jim Nasby
Date:
On 6/12/16 2:13 AM, Ants Aasma wrote:
> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>> > 1. Instead of doing the entire database files encryption, how about
>> > providing user an option to protect only some particular tables that
>> > wants the encryption at table/tablespace level. This not only provides
>> > an option to the user, it reduces the performance impact on tables
>> > that doesn't need any encryption. The problem with this approach
>> > is that every xlog record needs to validate to handle the encryption
>> > /decryption, instead of at page level.
> Is there a real need for this? The customers I have talked to want to
> encrypt the whole database and my goal is to make the feature fast
> enough to make that feasible for pretty much everyone. I guess
> switching encryption off per table would be feasible, but the key
> setup would still need to be done at server startup. Per record
> encryption would result in some additional information leakage though.
> Overall I thought it would not be worth it, but I'm willing to have my
> mind changed on this.

I actually design with this in mind. Tables that contain sensitive info 
go into designated schemas, partly so that you can blanket move all of 
those to an encrypted tablespace (or safer would be to move things not 
in those schemas to an unencrypted tablespace). Since that can be done 
with an encrypted filesystem maybe that's good enough. (It's not really 
clear to me what this buys us over an encrypted FS, other than a feature 
comparison checkmark...)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: WIP: Data at rest encryption

From
PostgreSQL - Hans-Jürgen Schönig
Date:

On 06/14/2016 09:59 PM, Jim Nasby wrote:
> On 6/12/16 2:13 AM, Ants Aasma wrote:
>> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>>> > 1. Instead of doing the entire database files encryption, how about
>>> > providing user an option to protect only some particular tables that
>>> > wants the encryption at table/tablespace level. This not only 
>>> provides
>>> > an option to the user, it reduces the performance impact on tables
>>> > that doesn't need any encryption. The problem with this approach
>>> > is that every xlog record needs to validate to handle the encryption
>>> > /decryption, instead of at page level.
>> Is there a real need for this? The customers I have talked to want to
>> encrypt the whole database and my goal is to make the feature fast
>> enough to make that feasible for pretty much everyone. I guess
>> switching encryption off per table would be feasible, but the key
>> setup would still need to be done at server startup. Per record
>> encryption would result in some additional information leakage though.
>> Overall I thought it would not be worth it, but I'm willing to have my
>> mind changed on this.
>
> I actually design with this in mind. Tables that contain sensitive 
> info go into designated schemas, partly so that you can blanket move 
> all of those to an encrypted tablespace (or safer would be to move 
> things not in those schemas to an unencrypted tablespace). Since that 
> can be done with an encrypted filesystem maybe that's good enough. 
> (It's not really clear to me what this buys us over an encrypted FS, 
> other than a feature comparison checkmark...)

the reason why this is needed is actually very simple: security 
guidelines and legal requirements ...
we have dealt with a couple of companies recently, who explicitly 
demanded PostgreSQL level encryption in a transparent way to fulfill 
some internal or legal requirements. this is especially true for 
financial stuff. and yes, sure ... you can do a lot of stuff with 
filesystem encryption.
the core idea of this entire thing is however to have a counterpart on 
the database level. if you don't have the key you cannot start the 
instance and if you happen to get access to the filesystem you are still 
not able to fire up the DB.
as it said: requirements by ever bigger companies.

as far as benchmarking is concerned: i did a quick test yesterday (not 
with the final AES implementation yet) and i got pretty good results. 
with a reasonably well cached database in a typical application I expect 
to loose around 10-20%. if everything fits in memory there is 0 loss of 
course. the worst I got with the standard AES (no hardware support used 
yet) I lost around 45% or so. but this requires a value as low as 32 MB 
of shared buffers or so.
    many thanks,
        hans


-- 
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at




Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 6/7/16 9:56 AM, Ants Aasma wrote:
>>
>> Similar things can be achieved with filesystem level encryption.
>> However this is not always optimal for various reasons. One of the
>> better reasons is the desire for HSM based encryption in a storage
>> area network based setup.
>
> Could you explain this in more detail?

I don't think Ants ever responded to this point.

I'm curious whether this is something that is likely to be pursued for
PostgreSQL 11.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Ants Aasma
Date:
On Mon, Jun 12, 2017 at 10:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 6/7/16 9:56 AM, Ants Aasma wrote:
>>>
>>> Similar things can be achieved with filesystem level encryption.
>>> However this is not always optimal for various reasons. One of the
>>> better reasons is the desire for HSM based encryption in a storage
>>> area network based setup.
>>
>> Could you explain this in more detail?
>
> I don't think Ants ever responded to this point.
>
> I'm curious whether this is something that is likely to be pursued for
> PostgreSQL 11.

Yes, the plan is to pick it up again, Real Soon Now(tm). There are a
couple of loose ends for stuff that should be encrypted, but in the
current state of the patch aren't yet (from the top of my head,
logical decoding and pg_stat_statements write some files). The code
handling keys could really take better precautions as Peter pointed
out in another e-mail. And I expect there to be a bunch of polishing
work to make the APIs as good as they can be.

To answer Peter's question about HSMs, many enterprise deployments are
on top of shared storage systems. For regulatory reasons or to limit
security clearance of storage administrators, the data on shared
storage should be encrypted. Now for there to be any point to this
endeavor, the key needs to be stored somewhere else. This is where
hardware security modules come in. They are basically hardware key
storage appliances that can either output the key when requested, or
for higher security hold onto the key and perform
encryption/decryption on behalf of the user. The patch enables the
user to use a custom shell command to go and fetch the key from the
HSM, for example using the KMIP protocol. Or a motivated person could
write an extension that implements the encryption hooks to delegate
encryption/decryption of blocks to an HSM.

Fundamentally there doesn't seem to be a big benefit of implementing
the encryption at PostgreSQL level instead of the filesystem. The
patch doesn't take any real advantage from the higher level knowledge
of the system, nor do I see much possibility for it to do that. The
main benefit for us is that it's much easier to get a PostgreSQL based
solution deployed.

I'm curious if the community thinks this is a feature worth having?
Even considering that security experts would classify this kind of
encryption as a checkbox feature.

Regards,
Ants Aasma



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/12/17 17:11, Ants Aasma wrote:
> I'm curious if the community thinks this is a feature worth having?
> Even considering that security experts would classify this kind of
> encryption as a checkbox feature.

File system encryption already exists and is well-tested.  I don't see
any big advantages in re-implementing all of this one level up.  You
would have to touch every single place in PostgreSQL backend and tool
code where a file is being read or written.  Yikes.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Ants, all,

* Ants Aasma (ants.aasma@eesti.ee) wrote:
> Yes, the plan is to pick it up again, Real Soon Now(tm). There are a
> couple of loose ends for stuff that should be encrypted, but in the
> current state of the patch aren't yet (from the top of my head,
> logical decoding and pg_stat_statements write some files). The code
> handling keys could really take better precautions as Peter pointed
> out in another e-mail. And I expect there to be a bunch of polishing
> work to make the APIs as good as they can be.

Very glad to hear that you're going to be continuing to work on this
effort.

> To answer Peter's question about HSMs, many enterprise deployments are
> on top of shared storage systems. For regulatory reasons or to limit
> security clearance of storage administrators, the data on shared
> storage should be encrypted. Now for there to be any point to this
> endeavor, the key needs to be stored somewhere else. This is where
> hardware security modules come in. They are basically hardware key
> storage appliances that can either output the key when requested, or
> for higher security hold onto the key and perform
> encryption/decryption on behalf of the user. The patch enables the
> user to use a custom shell command to go and fetch the key from the
> HSM, for example using the KMIP protocol. Or a motivated person could
> write an extension that implements the encryption hooks to delegate
> encryption/decryption of blocks to an HSM.

An extension, or perhaps even something built-in, would certainly be
good here but I don't think it's necessary in an initial implementation
as long as it's something we can do later.

> Fundamentally there doesn't seem to be a big benefit of implementing
> the encryption at PostgreSQL level instead of the filesystem. The
> patch doesn't take any real advantage from the higher level knowledge
> of the system, nor do I see much possibility for it to do that. The
> main benefit for us is that it's much easier to get a PostgreSQL based
> solution deployed.

Making it easier to get a PostgreSQL solution deployed is certainly a
very worthwhile goal.

> I'm curious if the community thinks this is a feature worth having?
> Even considering that security experts would classify this kind of
> encryption as a checkbox feature.

I would say that some security experts would consider it a 'checkbox'
feature, while others would say that it's actually a quite useful
capability for a database to have and isn't just for being able to check
a given box.  I tended to lean towards the 'checkbox' camp and
encouraged people to use filesystem encryption also, but there are
use-cases where it'd be really nice to be able to have PG doing the
encryption instead of the filesystem because then you can do things like
backup the database, copy it somewhere else directly, and then restore
it using the regular PG mechanisms, as long as you have access to the
key.  That's not something you can directly do with filesystem-level
encryption (unless you happen to be lucky enough to be able to use ZFS,
which can do exporting, or you can do a block-level exact copy to an
exactly identically sized partition on the remote side or similar..),
and while you could encrypt the PG files during the backup, that
requires that you make sure both sides agree on how that encryption is
done and have the same tools for performing the encryption/decryption.
Possible, certainly, but not nearly as convenient.

+1 for having this capability.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/13/17 09:24, Stephen Frost wrote:
> but there are
> use-cases where it'd be really nice to be able to have PG doing the
> encryption instead of the filesystem because then you can do things like
> backup the database, copy it somewhere else directly, and then restore
> it using the regular PG mechanisms, as long as you have access to the
> key.  That's not something you can directly do with filesystem-level
> encryption

Interesting point.

I wonder what the proper extent of "encryption at rest" should be.  If
you encrypt just on a file or block level, then someone looking at the
data directory or a backup can still learn a number of things about the
number of tables, transaction rates, various configuration settings, and
so on.  In the scenario of a sensitive application hosted on a shared
SAN, I don't think that is good enough.

Also, in the use case you describe, if you use pg_basebackup to make a
direct encrypted copy of a data directory, I think that would mean you'd
have to keep using the same key for all copies.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> I wonder what the proper extent of "encryption at rest" should be.  If
> you encrypt just on a file or block level, then someone looking at the
> data directory or a backup can still learn a number of things about the
> number of tables, transaction rates, various configuration settings, and
> so on.  In the scenario of a sensitive application hosted on a shared
> SAN, I don't think that is good enough.

If someone has access to the SAN, it'd be very difficult to avoid
revealing some information about transaction rates or I/O throughput.
Being able to have the configuration files encrypted would be good
(thinking particularly about pg_hba.conf/pg_ident.conf) but I don't
know that it's strictly necessary or that it would have to be done in an
initial version.

Certainly, there is a trade-off here when it comes to the information
which someone can learn about the system by looking at the number and
sizes of files from using PG-based encryption vs. what information
someone can learn from being able to look at only an encrypted
filesystem, but that's a trade-off which security experts are good at
making a determination on and will be case-by-case, based on how easy
setting up filesystem-encryption is in a particular environment and what
the use-cases are for the system.

> Also, in the use case you describe, if you use pg_basebackup to make a
> direct encrypted copy of a data directory, I think that would mean you'd
> have to keep using the same key for all copies.

That's true, but that might be acceptable and possibly even desirable in
certain cases.  On the other hand, it would certainly be a useful
feature to have a way to migrate from one key to another.  Perhaps that
would start out as an off-line tool, but maybe we'd be able to work out
a way to support having it done on-line in the future (certainly
non-trivial, but if we supported multiple keys concurrently with a
preference for which key is used to write data back out, and required
that checksums be in place to allow us to test if decrypting with a
specific key worked ... lots more hand-waving here... ).

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Mon, Jun 12, 2017 at 5:11 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> Fundamentally there doesn't seem to be a big benefit of implementing
> the encryption at PostgreSQL level instead of the filesystem. The
> patch doesn't take any real advantage from the higher level knowledge
> of the system, nor do I see much possibility for it to do that. The
> main benefit for us is that it's much easier to get a PostgreSQL based
> solution deployed.

I agree with all of that, but ease of deployment has some value unto
itself.  I think pretty much every modern operating system has some
way of encrypting a filesystem, but it's different on Linux vs.
Windows vs. macOS vs. BSD, and you probably need to be the system
administrator on any of those systems in order to set it up.
Something built into PostgreSQL could run without administrator
privileges and work the same way on every platform we support.  That
would be useful.

Of course, what would be even more useful is fine-grained encryption -
encrypt these tables (and the corresponding indexes, toast tables, and
WAL records related to any of that) with this key, encrypt these other
tables (and the same list of associated stuff) with this other key,
and leave the rest unencrypted.  The problem with that is that you
probably can't run recovery without all of the keys, and even on a
clean startup there would be a good deal of engineering work involved
in refusing access to tables whose key hadn't been provided yet.  I
don't think we should wait to have this feature until all of those
problems are solved.  In my opinion, something coarse-grained that
just encrypts the whole cluster would be a pretty useful place to
start and would meet the needs of enough people to be worthwhile all
on its own.  Performance is likely to be poor on large databases,
because every time a page transits between shared_buffers and the
buffer cache we've got to en/decrypt, but as long as it's only poor
for the people who opt into the feature I don't see a big problem with
that.

I anticipate that one of the trickier problems here will be handling
encryption of the write-ahead log.  Suppose you encrypt WAL a block at
a time.  In the current system, once you've written and flushed a
block, you can consider it durably committed, but if that block is
encrypted, this is no longer true.  A crash might tear the block,
making it impossible to decrypt.  Replay will therefore stop at the
end of the previous block, not at the last record actually flushed as
would happen today.  So, your synchronous_commit suddenly isn't.  A
similar problem will occur any other page where we choose not to
protect against torn pages using full page writes.  For instance,
unless checksums are enabled or wal_log_hints=on, we'll write a data
page where a single bit has been flipped and assume that the bit will
either make it to disk or not; the page can't really be torn in any
way that hurts us.  But with encryption that's no longer true, because
the hint bit will turn into much more than a single bit flip, and
rereading that page with half old and half new contents will be the
end of the world (TM).  I don't know off-hand whether we're
protecting, say, CLOG page writes with FPWs.: because setting a couple
of bits is idempotent and doesn't depend on the existing page
contents, we might not need it currently, but with encryption, every
bit in the page depends on every other bit in the page, so we
certainly would.  I don't know how many places we've got assumptions
like this baked into the system, but I'm guessing there are a bunch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 11:35:03AM -0400, Robert Haas wrote:
> I anticipate that one of the trickier problems here will be handling
> encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> a time.  In the current system, once you've written and flushed a
> block, you can consider it durably committed, but if that block is
> encrypted, this is no longer true.  A crash might tear the block,
> making it impossible to decrypt.  Replay will therefore stop at the
> end of the previous block, not at the last record actually flushed as
> would happen today.  So, your synchronous_commit suddenly isn't.  A
> similar problem will occur any other page where we choose not to
> protect against torn pages using full page writes.  For instance,
> unless checksums are enabled or wal_log_hints=on, we'll write a data
> page where a single bit has been flipped and assume that the bit will
> either make it to disk or not; the page can't really be torn in any
> way that hurts us.  But with encryption that's no longer true, because
> the hint bit will turn into much more than a single bit flip, and
> rereading that page with half old and half new contents will be the
> end of the world (TM).  I don't know off-hand whether we're
> protecting, say, CLOG page writes with FPWs.: because setting a couple
> of bits is idempotent and doesn't depend on the existing page
> contents, we might not need it currently, but with encryption, every
> bit in the page depends on every other bit in the page, so we
> certainly would.  I don't know how many places we've got assumptions
> like this baked into the system, but I'm guessing there are a bunch.

That is not necessary true.  You are describing a cipher mode where the
user data goes through the cipher, e.g. AES in CBC mode.  However, if
you are using a stream cipher based on a block cipher, e.g. CTR, GCM,
you XOR the user data with a random bit stream, and in that case one bit
change in user data would be one bit change in the cipher output.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote:
> > Also, in the use case you describe, if you use pg_basebackup to make a
> > direct encrypted copy of a data directory, I think that would mean you'd
> > have to keep using the same key for all copies.
> 
> That's true, but that might be acceptable and possibly even desirable in
> certain cases.  On the other hand, it would certainly be a useful
> feature to have a way to migrate from one key to another.  Perhaps that
> would start out as an off-line tool, but maybe we'd be able to work out
> a way to support having it done on-line in the future (certainly
> non-trivial, but if we supported multiple keys concurrently with a
> preference for which key is used to write data back out, and required
> that checksums be in place to allow us to test if decrypting with a
> specific key worked ... lots more hand-waving here... ).

As I understand it, having encryption in the database means the key is
stored in the database, while having encryption in the file system means
the key is stored in the operating system somewhere.  Of course, if the
key stored in the database is visible to someone using the operating
system, we really haven't added much/any security --- I guess my point
is that the OS easily can hide the key from the database, but the
database can't easily hide the key from the operating system.

Of course, if the storage is split from the database server then having
the key on the database server seems like a win.  However, I think a db
server could easily encrypt blocks before sending them to the SAN
server.  This would not work for NAS, of course, since it is file-based.

I have to admit we tend to avoid heavy-API solutions that are designed
just to work around deployment challenges.  Commercial databases are
fine in doing that, but it leads to very complex products.

I think the larger issue is where to store the key.  I would love for us
to come up with a unified solution to that and then build encryption on
that, including all-cluster encryption.

One cool idea I have is using public encryption to store the encryption
key by users who don't know the decryption key, e.g. RSA.  It would be a
write-only encryption option.  Not sure how useful that is, but it
easily possible, and doesn't require us to keep the _encryption_ key
secret, just the decryption one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote:
> > > Also, in the use case you describe, if you use pg_basebackup to make a
> > > direct encrypted copy of a data directory, I think that would mean you'd
> > > have to keep using the same key for all copies.
> >
> > That's true, but that might be acceptable and possibly even desirable in
> > certain cases.  On the other hand, it would certainly be a useful
> > feature to have a way to migrate from one key to another.  Perhaps that
> > would start out as an off-line tool, but maybe we'd be able to work out
> > a way to support having it done on-line in the future (certainly
> > non-trivial, but if we supported multiple keys concurrently with a
> > preference for which key is used to write data back out, and required
> > that checksums be in place to allow us to test if decrypting with a
> > specific key worked ... lots more hand-waving here... ).
>
> As I understand it, having encryption in the database means the key is
> stored in the database, while having encryption in the file system means
> the key is stored in the operating system somewhere.

Key management is an entirely independent discussion from this and the
proposal from Ants, as I understand it, is that the key would *not* be
in the database but could be anywhere that a shell command could get it
from, including possibly a HSM (hardware device).

Having the data encrypted by PostgreSQL does not mean the key is stored
in the database.

> Of course, if the
> key stored in the database is visible to someone using the operating
> system, we really haven't added much/any security --- I guess my point
> is that the OS easily can hide the key from the database, but the
> database can't easily hide the key from the operating system.

This is correct- the key must be available to the PostgreSQL process
and therefore someone with privileged access to the OS would be able to
retrieve the key, but that's also true of filesystem encryption.

Basically, if the server is doing the encryption and you have the
ability to read all memory on the server then you can get the key.  Of
course, if you can read all memory then you can just look at shared
buffers and you don't really need to bother yourself with the key or
the encryption, and it doesn't make any difference if you're encrypting
in the database or in the filesystem.  That attack vector is not one
which this is intending to address.

> Of course, if the storage is split from the database server then having
> the key on the database server seems like a win.  However, I think a db
> server could easily encrypt blocks before sending them to the SAN
> server.  This would not work for NAS, of course, since it is file-based.

The key doesn't necessairly have to be stored anywhere on the server- it
just needs to be kept in memory while the database process is running
and made available to the database at startup, unless an external system
is used to perform the encryption, which might be possible with an
extension, as discussed.  In some environments, it might be acceptable
to have the key stored on the database server, of course, but there's no
requirement for the key to be stored on the database server or in the
database at all.

> I have to admit we tend to avoid heavy-API solutions that are designed
> just to work around deployment challenges.  Commercial databases are
> fine in doing that, but it leads to very complex products.

I'm not following what you mean here.

> I think the larger issue is where to store the key.  I would love for us
> to come up with a unified solution to that and then build encryption on
> that, including all-cluster encryption.

Honestly, key management is something that I'd rather we *not* worry
about in an initial implementation, which is one reason that I liked the
approach discussed here of having a command which runs to provide the
key.  We could certainly look into improving that in the future, but key
management really is a largely independent issue from encryption and
it's much more difficult and complicated and whatever we come up with
would still almost certainly be usable with the approach proposed here.

> One cool idea I have is using public encryption to store the encryption
> key by users who don't know the decryption key, e.g. RSA.  It would be a
> write-only encryption option.  Not sure how useful that is, but it
> easily possible, and doesn't require us to keep the _encryption_ key
> secret, just the decryption one.

The downside here is that asymmetric encryption is much more expensive
than symmetric encryption and that probably makes it a non-starter.  I
do think we'll want to support multiple encryption methods and perhaps
we can have an option where asymmetric encryption is used, but that's
not what I expect will be typically used.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Tue, Jun 13, 2017 at 12:23 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Key management is an entirely independent discussion from this and the
> proposal from Ants, as I understand it, is that the key would *not* be
> in the database but could be anywhere that a shell command could get it
> from, including possibly a HSM (hardware device).

Yes.  I think the right way to implement this is something like:

1. Have a GUC that runs a shell command to get the key.

2. If the command successfully gets the key, it prints it to stdout
and returns 0.

3. If it doesn't get successfully get the key, it returns 1.  The
database can retry or give up, whatever we decide to do.

That way, if the user wants to store the key in an unencrypted text
file, they can set the encryption_key_command = 'cat /not/very/secure'
and call it a day.  If they want to prompt the user on the console or
request the key from an HSM or get it in any other way, they just have
to write the appropriate shell script.  We just provide mechanism, not
policy, and the user can adopt any policy they like, from an extremely
insecure policy to one suitable for Fort Knox.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 12:23:01PM -0400, Stephen Frost wrote:
> > As I understand it, having encryption in the database means the key is
> > stored in the database, while having encryption in the file system means
> > the key is stored in the operating system somewhere.  
> 
> Key management is an entirely independent discussion from this and the
> proposal from Ants, as I understand it, is that the key would *not* be
> in the database but could be anywhere that a shell command could get it
> from, including possibly a HSM (hardware device).
> 
> Having the data encrypted by PostgreSQL does not mean the key is stored
> in the database.

Yes, I was just simplifying.

> > Of course, if the
> > key stored in the database is visible to someone using the operating
> > system, we really haven't added much/any security --- I guess my point
> > is that the OS easily can hide the key from the database, but the
> > database can't easily hide the key from the operating system.
> 
> This is correct- the key must be available to the PostgreSQL process
> and therefore someone with privileged access to the OS would be able to
> retrieve the key, but that's also true of filesystem encryption.
> 
> Basically, if the server is doing the encryption and you have the
> ability to read all memory on the server then you can get the key.  Of
> course, if you can read all memory then you can just look at shared
> buffers and you don't really need to bother yourself with the key or
> the encryption, and it doesn't make any difference if you're encrypting
> in the database or in the filesystem.  That attack vector is not one
> which this is intending to address.

My point is that if you have the key accessible to the database server,
both the database server and OS have access to it.  If you store it in
the OS, only the OS has access to it.

> > I have to admit we tend to avoid heavy-API solutions that are designed
> > just to work around deployment challenges.  Commercial databases are
> > fine in doing that, but it leads to very complex products.
> 
> I'm not following what you mean here.

By adding all-cluster encryption, we are re-implementing something the
OS does just fine, in most cases.  We are going to have API overhead to
do it in the database, and historically we have avoided that.

> > One cool idea I have is using public encryption to store the encryption
> > key by users who don't know the decryption key, e.g. RSA.  It would be a
> > write-only encryption option.  Not sure how useful that is, but it
> > easily possible, and doesn't require us to keep the _encryption_ key
> > secret, just the decryption one.
> 
> The downside here is that asymmetric encryption is much more expensive
> than symmetric encryption and that probably makes it a non-starter.  I
> do think we'll want to support multiple encryption methods and perhaps
> we can have an option where asymmetric encryption is used, but that's
> not what I expect will be typically used.

Well, usually the symetric key is stored using RSA and a symetric
cipher is used to encrypt/decrypt the data.  I was thinking of a case
where you encrypt a row using a symetric key, then store RSA-encrypted
versions of the symetric key encrypted that only specific users could
decrypt and get the key to decrypt the data.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Joe Conway
Date:
On 06/13/2017 09:28 AM, Robert Haas wrote:
> On Tue, Jun 13, 2017 at 12:23 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> Key management is an entirely independent discussion from this and the
>> proposal from Ants, as I understand it, is that the key would *not* be
>> in the database but could be anywhere that a shell command could get it
>> from, including possibly a HSM (hardware device).
>
> Yes.  I think the right way to implement this is something like:
>
> 1. Have a GUC that runs a shell command to get the key.
>
> 2. If the command successfully gets the key, it prints it to stdout
> and returns 0.
>
> 3. If it doesn't get successfully get the key, it returns 1.  The
> database can retry or give up, whatever we decide to do.
>
> That way, if the user wants to store the key in an unencrypted text
> file, they can set the encryption_key_command = 'cat /not/very/secure'
> and call it a day.  If they want to prompt the user on the console or
> request the key from an HSM or get it in any other way, they just have
> to write the appropriate shell script.  We just provide mechanism, not
> policy, and the user can adopt any policy they like, from an extremely
> insecure policy to one suitable for Fort Knox.

Agreed, but as Bruce alluded to, we want this to be a master key, which
is in turn used to encrypt the actual key, or keys, that are used to
encrypt the data. The actual data encryption keys could be very long
randomly generated binary, and there could be more than one of them
(e.g. one per tablespace) in a file which is encrypted with the master
key. This is more secure and allows, for example the master key to be
changed without having to decrypt/re-encrypt the entire database.

Joe


--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote:
> > That way, if the user wants to store the key in an unencrypted text
> > file, they can set the encryption_key_command = 'cat /not/very/secure'
> > and call it a day.  If they want to prompt the user on the console or
> > request the key from an HSM or get it in any other way, they just have
> > to write the appropriate shell script.  We just provide mechanism, not
> > policy, and the user can adopt any policy they like, from an extremely
> > insecure policy to one suitable for Fort Knox.
> 
> Agreed, but as Bruce alluded to, we want this to be a master key, which
> is in turn used to encrypt the actual key, or keys, that are used to
> encrypt the data. The actual data encryption keys could be very long
> randomly generated binary, and there could be more than one of them
> (e.g. one per tablespace) in a file which is encrypted with the master
> key. This is more secure and allows, for example the master key to be
> changed without having to decrypt/re-encrypt the entire database.

Yes, thank you.  Also, you can make multiple RSA-encrypted copies of the
symetric key, one for each role you want to view the data.  And good
point on the ability to change the RSA key/password without having to
reencrypt the data.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 12:23:01PM -0400, Stephen Frost wrote:
> > > Of course, if the
> > > key stored in the database is visible to someone using the operating
> > > system, we really haven't added much/any security --- I guess my point
> > > is that the OS easily can hide the key from the database, but the
> > > database can't easily hide the key from the operating system.
> >
> > This is correct- the key must be available to the PostgreSQL process
> > and therefore someone with privileged access to the OS would be able to
> > retrieve the key, but that's also true of filesystem encryption.
> >
> > Basically, if the server is doing the encryption and you have the
> > ability to read all memory on the server then you can get the key.  Of
> > course, if you can read all memory then you can just look at shared
> > buffers and you don't really need to bother yourself with the key or
> > the encryption, and it doesn't make any difference if you're encrypting
> > in the database or in the filesystem.  That attack vector is not one
> > which this is intending to address.
>
> My point is that if you have the key accessible to the database server,
> both the database server and OS have access to it.  If you store it in
> the OS, only the OS has access to it.

I understand that, but that's not a particularly interesting distinction
as the database server must, necessairly, have access to the data in the
database.

> > > I have to admit we tend to avoid heavy-API solutions that are designed
> > > just to work around deployment challenges.  Commercial databases are
> > > fine in doing that, but it leads to very complex products.
> >
> > I'm not following what you mean here.
>
> By adding all-cluster encryption, we are re-implementing something the
> OS does just fine, in most cases.  We are going to have API overhead to
> do it in the database, and historically we have avoided that.

We do try to avoid the overhead of additional function calls, in places
where it matters.  As this is all about reading and writing data, the
overhead for the additional check to see if we're doing encryption or
not is unlikely to be interesting.

> > > One cool idea I have is using public encryption to store the encryption
> > > key by users who don't know the decryption key, e.g. RSA.  It would be a
> > > write-only encryption option.  Not sure how useful that is, but it
> > > easily possible, and doesn't require us to keep the _encryption_ key
> > > secret, just the decryption one.
> >
> > The downside here is that asymmetric encryption is much more expensive
> > than symmetric encryption and that probably makes it a non-starter.  I
> > do think we'll want to support multiple encryption methods and perhaps
> > we can have an option where asymmetric encryption is used, but that's
> > not what I expect will be typically used.
>
> Well, usually the symetric key is stored using RSA and a symetric
> cipher is used to encrypt/decrypt the data.  I was thinking of a case
> where you encrypt a row using a symetric key, then store RSA-encrypted
> versions of the symetric key encrypted that only specific users could
> decrypt and get the key to decrypt the data.

This goes back to key management and I agree that it often makes sense
to use RSA or similar to encrypt the symmetric key, and this approach
would allow the user to do so.  That doesn't actually give you a
"write-only" encryption option though, since any user who can decrypt
the symmetric key is able to use the symmetric key for both encryption
and decryption, and someone who only has access to the RSA encryption
key can't actually encrypt the data since they can't access the
symmetric key.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce, Joe,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote:
> > > That way, if the user wants to store the key in an unencrypted text
> > > file, they can set the encryption_key_command = 'cat /not/very/secure'
> > > and call it a day.  If they want to prompt the user on the console or
> > > request the key from an HSM or get it in any other way, they just have
> > > to write the appropriate shell script.  We just provide mechanism, not
> > > policy, and the user can adopt any policy they like, from an extremely
> > > insecure policy to one suitable for Fort Knox.
> >
> > Agreed, but as Bruce alluded to, we want this to be a master key, which
> > is in turn used to encrypt the actual key, or keys, that are used to
> > encrypt the data. The actual data encryption keys could be very long
> > randomly generated binary, and there could be more than one of them
> > (e.g. one per tablespace) in a file which is encrypted with the master
> > key. This is more secure and allows, for example the master key to be
> > changed without having to decrypt/re-encrypt the entire database.
>
> Yes, thank you.  Also, you can make multiple RSA-encrypted copies of the
> symetric key, one for each role you want to view the data.  And good
> point on the ability to change the RSA key/password without having to
> reencrypt the data.

There's nothing in this proposal that prevents the user from using a
very long randomly generated binary key.  We aren't talking about
prompting the user for a password unless that's what they decide the
shell script should do, unless the user decides to do that and if they
do then that's their choice.

Let us, please, stop stressing over the right way to do key management
as part of this discussion about providing encryption.  The two are
different things and we do not need to solve both at once.

Further, yes, we will definitely want to get to a point where we can
encrypt subsets of the system in different ways, but that doesn't have
to be done in the first implementation either.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Joe Conway
Date:
On 06/13/2017 10:05 AM, Stephen Frost wrote:
> Bruce, Joe,
>
> * Bruce Momjian (bruce@momjian.us) wrote:
>> On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote:
>> > > That way, if the user wants to store the key in an unencrypted text
>> > > file, they can set the encryption_key_command = 'cat /not/very/secure'
>> > > and call it a day.  If they want to prompt the user on the console or
>> > > request the key from an HSM or get it in any other way, they just have
>> > > to write the appropriate shell script.  We just provide mechanism, not
>> > > policy, and the user can adopt any policy they like, from an extremely
>> > > insecure policy to one suitable for Fort Knox.
>> >
>> > Agreed, but as Bruce alluded to, we want this to be a master key, which
>> > is in turn used to encrypt the actual key, or keys, that are used to
>> > encrypt the data. The actual data encryption keys could be very long
>> > randomly generated binary, and there could be more than one of them
>> > (e.g. one per tablespace) in a file which is encrypted with the master
>> > key. This is more secure and allows, for example the master key to be
>> > changed without having to decrypt/re-encrypt the entire database.
>>
>> Yes, thank you.  Also, you can make multiple RSA-encrypted copies of the
>> symetric key, one for each role you want to view the data.  And good
>> point on the ability to change the RSA key/password without having to
>> reencrypt the data.
>
> There's nothing in this proposal that prevents the user from using a
> very long randomly generated binary key.  We aren't talking about
> prompting the user for a password unless that's what they decide the
> shell script should do, unless the user decides to do that and if they
> do then that's their choice.

Except shell escaping issues, etc, etc

> Let us, please, stop stressing over the right way to do key management
> as part of this discussion about providing encryption.  The two are
> different things and we do not need to solve both at once.

Not stressing, but this is an important part of the design and should be
done correctly. It is also very simple, so should not be hard to add.

> Further, yes, we will definitely want to get to a point where we can
> encrypt subsets of the system in different ways, but that doesn't have
> to be done in the first implementation either.

No, it doesn't, but that doesn't change the utility of doing it this way
from the start.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 01:01:32PM -0400, Stephen Frost wrote:
> > Well, usually the symetric key is stored using RSA and a symetric
> > cipher is used to encrypt/decrypt the data.  I was thinking of a case
> > where you encrypt a row using a symetric key, then store RSA-encrypted
> > versions of the symetric key encrypted that only specific users could
> > decrypt and get the key to decrypt the data.
> 
> This goes back to key management and I agree that it often makes sense
> to use RSA or similar to encrypt the symmetric key, and this approach
> would allow the user to do so.  That doesn't actually give you a
> "write-only" encryption option though, since any user who can decrypt
> the symmetric key is able to use the symmetric key for both encryption
> and decryption, and someone who only has access to the RSA encryption
> key can't actually encrypt the data since they can't access the
> symmetric key.
I think the big win of Postgres doing the encryption is that the
user-visible file system is no longer a target (assuming OS permissions
are bypassed), while for file system encryption it is the storage device
that is encrypted.

My big question is how many times are the OS permissions bypassed in a
way that would also not expose the db clusters key or db data?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Joe,

* Joe Conway (mail@joeconway.com) wrote:
> Except shell escaping issues, etc, etc

That's not an issue- we're talking about reading the stdout of some
other process, there's no shell escaping that has to be done there.

> > Let us, please, stop stressing over the right way to do key management
> > as part of this discussion about providing encryption.  The two are
> > different things and we do not need to solve both at once.
>
> Not stressing, but this is an important part of the design and should be
> done correctly. It is also very simple, so should not be hard to add.

I disagree that proper key management is "simple".  If we really get to
a point where we think we have a simple answer to it then perhaps that
can be implemented in addition to the encryption piece in the same
release cycle- but they certainly don't need to be in the same patch,
nor do we need to make good key management a requirement for adding
encryption support.

> > Further, yes, we will definitely want to get to a point where we can
> > encrypt subsets of the system in different ways, but that doesn't have
> > to be done in the first implementation either.
>
> No, it doesn't, but that doesn't change the utility of doing it this way
> from the start.

No, but it seriously changes the level of complexity.  I feel like we're
trying to go from zero to light speed here because there's an idea that
it's "simple" to add X, Y or Z additional requirement beyond the basic
feature, but we don't have anything yet.  I continue to be of the
feeling that we should start simple and keep it to the basic feature
first and make sure that we can actually get that right before we start
looking into adding on additional bits.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 01:01:32PM -0400, Stephen Frost wrote:
> > > Well, usually the symetric key is stored using RSA and a symetric
> > > cipher is used to encrypt/decrypt the data.  I was thinking of a case
> > > where you encrypt a row using a symetric key, then store RSA-encrypted
> > > versions of the symetric key encrypted that only specific users could
> > > decrypt and get the key to decrypt the data.
> >
> > This goes back to key management and I agree that it often makes sense
> > to use RSA or similar to encrypt the symmetric key, and this approach
> > would allow the user to do so.  That doesn't actually give you a
> > "write-only" encryption option though, since any user who can decrypt
> > the symmetric key is able to use the symmetric key for both encryption
> > and decryption, and someone who only has access to the RSA encryption
> > key can't actually encrypt the data since they can't access the
> > symmetric key.
>
> I think the big win of Postgres doing the encryption is that the
> user-visible file system is no longer a target (assuming OS permissions
> are bypassed), while for file system encryption it is the storage device
> that is encrypted.

If OS permissions are bypassed then the encryption isn't going to help
because the attacker can just access shared memory.

The big wins for doing the encryption in PostgreSQL are, as Robert and I
have both mentioned on this thread already, that it provides
data-at-rest encryption in an easier to deploy fashion which will work
the same across different systems and allows the encrypted cluster to be
transferred more easily between systems.  There are almsot certainly
other wins from having PG do the encryption, but the above strikes me as
the big ones, and those are certainly valuable enough on their own for
us to seriously consider adding this capability.

> My big question is how many times are the OS permissions bypassed in a
> way that would also not expose the db clusters key or db data?

This is not the attack vector that this solution is attempting to
address, so there really isn't much point in discussing it on this
thread.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Joe Conway
Date:
On 06/13/2017 10:20 AM, Stephen Frost wrote:
> * Joe Conway (mail@joeconway.com) wrote:
>> Except shell escaping issues, etc, etc
>
> That's not an issue- we're talking about reading the stdout of some
> other process, there's no shell escaping that has to be done there.

It could be an issue depending on how the user stores their master key.

> I disagree that proper key management is "simple".  If we really get to
> a point where we think we have a simple answer to it then perhaps that
> can be implemented in addition to the encryption piece in the same
> release cycle- but they certainly don't need to be in the same patch,
> nor do we need to make good key management a requirement for adding
> encryption support.

I never said key management was simple. Indeed it is the most complex
and hazardous part of all this as you said earlier. What is simple is
implementing a master key encrypting actual keys scheme. Keeping the
user's master key management out of this design is unchanged by what I
proposed, and what I proposed is a superior yet simple method. Yes, it
can be done separately but what is the point? We should at least discuss
it as part of the design.

> No, but it seriously changes the level of complexity.  I feel like we're
> trying to go from zero to light speed here because there's an idea that
> it's "simple" to add X, Y or Z additional requirement beyond the basic
> feature, but we don't have anything yet.

I think that is hyperbole. It does not significantly add to the
complexity of what is being discussed.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 01:25:00PM -0400, Stephen Frost wrote:
> > I think the big win of Postgres doing the encryption is that the
> > user-visible file system is no longer a target (assuming OS permissions
> > are bypassed), while for file system encryption it is the storage device
> > that is encrypted.
> 
> If OS permissions are bypassed then the encryption isn't going to help
> because the attacker can just access shared memory.
> 
> The big wins for doing the encryption in PostgreSQL are, as Robert and I
> have both mentioned on this thread already, that it provides
> data-at-rest encryption in an easier to deploy fashion which will work
> the same across different systems and allows the encrypted cluster to be
> transferred more easily between systems.  There are almsot certainly
> other wins from having PG do the encryption, but the above strikes me as
> the big ones, and those are certainly valuable enough on their own for
> us to seriously consider adding this capability.

Since you seem to be trying to shut down discussion, I will simply say I
am unimpressed that this use-case is sufficient justification to add the
feature.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Joe,

* Joe Conway (mail@joeconway.com) wrote:
> On 06/13/2017 10:20 AM, Stephen Frost wrote:
> > * Joe Conway (mail@joeconway.com) wrote:
> >> Except shell escaping issues, etc, etc
> >
> > That's not an issue- we're talking about reading the stdout of some
> > other process, there's no shell escaping that has to be done there.
>
> It could be an issue depending on how the user stores their master key.

... eh?  The user gives us a command to run, we run it, it spits out
some binary blob to stdout which we read in and use as the key.  I don't
see where in that there's any need to be concerned about shell escaping
issues.

> > I disagree that proper key management is "simple".  If we really get to
> > a point where we think we have a simple answer to it then perhaps that
> > can be implemented in addition to the encryption piece in the same
> > release cycle- but they certainly don't need to be in the same patch,
> > nor do we need to make good key management a requirement for adding
> > encryption support.
>
> I never said key management was simple. Indeed it is the most complex
> and hazardous part of all this as you said earlier. What is simple is
> implementing a master key encrypting actual keys scheme. Keeping the
> user's master key management out of this design is unchanged by what I
> proposed, and what I proposed is a superior yet simple method. Yes, it
> can be done separately but what is the point? We should at least discuss
> it as part of the design.

The point is that we haven't got any encryption of any kind and you're
suggesting we introduce key management which you agree isn't simple.
That you're trying to argue that it actually is simple because it's just
<<description of a key management system>> is a bit bizarre to me.

> > No, but it seriously changes the level of complexity.  I feel like we're
> > trying to go from zero to light speed here because there's an idea that
> > it's "simple" to add X, Y or Z additional requirement beyond the basic
> > feature, but we don't have anything yet.
>
> I think that is hyperbole. It does not significantly add to the
> complexity of what is being discussed.

If I stipulate that it's, indeed, simple to implement a system where we
have a master key and other keys- where are those other keys going to
kept (even if they're encrypted)?  How many extra keys are we talking
about?  When are those keys going to be used and how do we know what key
to use when?  If we're going to do per-tablespace or per-table
encryption, how are we going to handle the WAL for that?  Will we have
an independent key for WAL (in which case, what's the point of using
different keys for tables, et al, when all the data is in the WAL?)?

Having a single key which is used cluster-wide is extremely simple and
lets us get some form of encryption first before we try to tackle the
more complicated multi-key/partial-encryption system.

Just to be clear, I don't have any issue with discussing the idea that
we want to get to a point where we can work with multiple keys and
encrypt different tables with different keys (or not encrypt certain
tables, et al) with the goal of implementing the single-key approach in
a way that allows us to expand on it down the road easily, I just don't
think we need to have it all done in the very first patch which adds the
ability to encrypt the data files.  Maybe you're not saying that it has
to be included in the first implementation, in which case we seem to
just be talking past each other, but that isn't the impression I got..

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 01:44:51PM -0400, Stephen Frost wrote:
> Just to be clear, I don't have any issue with discussing the idea that
> we want to get to a point where we can work with multiple keys and
> encrypt different tables with different keys (or not encrypt certain
> tables, et al) with the goal of implementing the single-key approach in
> a way that allows us to expand on it down the road easily, I just don't
> think we need to have it all done in the very first patch which adds the
> ability to encrypt the data files.  Maybe you're not saying that it has
> to be included in the first implementation, in which case we seem to
> just be talking past each other, but that isn't the impression I got..

We don't want to implement all-cluster encryption with a simple user API
and then realize we need another API for later encryption features, do
we?  And we are not going to know that if we don't talk about it, but
hey, this is just an email thread and I can marshal opposition to the
feature later when it appears, and point this all out again.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 01:44:51PM -0400, Stephen Frost wrote:
> > Just to be clear, I don't have any issue with discussing the idea that
> > we want to get to a point where we can work with multiple keys and
> > encrypt different tables with different keys (or not encrypt certain
> > tables, et al) with the goal of implementing the single-key approach in
> > a way that allows us to expand on it down the road easily, I just don't
> > think we need to have it all done in the very first patch which adds the
> > ability to encrypt the data files.  Maybe you're not saying that it has
> > to be included in the first implementation, in which case we seem to
> > just be talking past each other, but that isn't the impression I got..
>
> We don't want to implement all-cluster encryption with a simple user API
> and then realize we need another API for later encryption features, do
> we?

I actually suspect that's exactly where we'll end up- but due to
necessity rather than because there's a way to avoid it.  We are going
to want to encrypt cluster-wide components of the system (shared
catalogs, WAL, etc) and that means that we have to have a key provided
very early on.  That's a very different thing from being able to do
something like encrypt specific tables, tablespaces, etc, where the key
can be provided much later and we'll want to allow users to use SQL or
the libpq protocol to be able to specify what to encrypt and possibly
even provide the encryption keys.

That said, the approach outlined here could be used for both by
expanding on the command string, perhaps passing it a keyid which is
what we store in the catalog to indicate what key a table is encrypted
with and then the keyid is "%k" in the command string and the command
has to return the specified key for us to decrypt the table.  That would
involve adding a new catalog table to identify keys and their keyids,
I'd think, and an additional column in pg_class which specifies the key
(or perhaps we'd just have a new catalog table that says what tables are
encrypted in what way).

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 01:25:00PM -0400, Stephen Frost wrote:
> > > I think the big win of Postgres doing the encryption is that the
> > > user-visible file system is no longer a target (assuming OS permissions
> > > are bypassed), while for file system encryption it is the storage device
> > > that is encrypted.
> >
> > If OS permissions are bypassed then the encryption isn't going to help
> > because the attacker can just access shared memory.
> >
> > The big wins for doing the encryption in PostgreSQL are, as Robert and I
> > have both mentioned on this thread already, that it provides
> > data-at-rest encryption in an easier to deploy fashion which will work
> > the same across different systems and allows the encrypted cluster to be
> > transferred more easily between systems.  There are almsot certainly
> > other wins from having PG do the encryption, but the above strikes me as
> > the big ones, and those are certainly valuable enough on their own for
> > us to seriously consider adding this capability.
>
> Since you seem to be trying to shut down discussion, I will simply say I
> am unimpressed that this use-case is sufficient justification to add the
> feature.

I'm not trying to shut down discussion, I'm simply pointing out where
this feature will be helpful and where it won't be.  If there's a way to
make it better and able to address an attack where the OS permission
system is bypassed, that'd be great, but I certainly don't know of any
way to do that and we don't want to claim that this feature will protect
against an attack vector that it won't.

If the lack of that means you don't support the feature, that's
unfortunate as it seems to imply, to me at least, that we'll never have
any kind of encryption because there's no way for it to prevent attacks
where the OS permission system is able to be bypassed.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 02:23:39PM -0400, Stephen Frost wrote:
> I'm not trying to shut down discussion, I'm simply pointing out where
> this feature will be helpful and where it won't be.  If there's a way to
> make it better and able to address an attack where the OS permission
> system is bypassed, that'd be great, but I certainly don't know of any
> way to do that and we don't want to claim that this feature will protect
> against an attack vector that it won't.
> 
> If the lack of that means you don't support the feature, that's
> unfortunate as it seems to imply, to me at least, that we'll never have
> any kind of encryption because there's no way for it to prevent attacks
> where the OS permission system is able to be bypassed.

It means if we can't discuss the actual benefits that this feature
brings, and doesn't bring, and how it will deal with future feature
additions, then you are right we will never have it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 02:23:39PM -0400, Stephen Frost wrote:
> > I'm not trying to shut down discussion, I'm simply pointing out where
> > this feature will be helpful and where it won't be.  If there's a way to
> > make it better and able to address an attack where the OS permission
> > system is bypassed, that'd be great, but I certainly don't know of any
> > way to do that and we don't want to claim that this feature will protect
> > against an attack vector that it won't.
> >
> > If the lack of that means you don't support the feature, that's
> > unfortunate as it seems to imply, to me at least, that we'll never have
> > any kind of encryption because there's no way for it to prevent attacks
> > where the OS permission system is able to be bypassed.
>
> It means if we can't discuss the actual benefits that this feature
> brings, and doesn't bring, and how it will deal with future feature
> additions, then you are right we will never have it.

I apologize for having come across as trying to shut down discussion,
that was not my intent.

It's good to discuss what the feature would bring and what cases it
doesn't cover, as well as discussing how it can be designed to make sure
that later improvements are able to be done without having to change it
around.  I do think it's a good idea for us to consider taking an
incremental approach where we're adding pieces and building things up as
we go.  I'm concerned that if we try to do too much in the initial
implementation that we'll end up not having anything.

As it relates to the different attack vectors that this would address,
it's primairly the same ones which filesystem-level encryption also
addresses, but it's an improvement when it comes to ease of use.
Unfortunately, it won't address cases where the OS is compromised.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote:
> It's good to discuss what the feature would bring and what cases it
> doesn't cover, as well as discussing how it can be designed to make sure
> that later improvements are able to be done without having to change it
> around.  I do think it's a good idea for us to consider taking an
> incremental approach where we're adding pieces and building things up as
> we go.  I'm concerned that if we try to do too much in the initial
> implementation that we'll end up not having anything.
> 
> As it relates to the different attack vectors that this would address,
> it's primairly the same ones which filesystem-level encryption also
> addresses, but it's an improvement when it comes to ease of use.
> Unfortunately, it won't address cases where the OS is compromised.

OK, so let's go back.  You are saying there are no security benefits to
this vs. file system encryption.  The benefit is allowing configuration
in the database rather than the OS?  You stated you can transfer
db-level encrypted files between servers, but can't you do that anyway? 
Is the problem that you have to encrypt before sending and decrypt on
arrival, if you don't trust the transmission link?  Is that used a lot? 
Is having the db encrypt every write a reasonable solution to that?

As far as future features, we don't have to add the all features at this
time, but if someone has a good idea for an API and we can make it work
easily while adding this feature, why wouldn't we do that?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote:
> > It's good to discuss what the feature would bring and what cases it
> > doesn't cover, as well as discussing how it can be designed to make sure
> > that later improvements are able to be done without having to change it
> > around.  I do think it's a good idea for us to consider taking an
> > incremental approach where we're adding pieces and building things up as
> > we go.  I'm concerned that if we try to do too much in the initial
> > implementation that we'll end up not having anything.
> >
> > As it relates to the different attack vectors that this would address,
> > it's primairly the same ones which filesystem-level encryption also
> > addresses, but it's an improvement when it comes to ease of use.
> > Unfortunately, it won't address cases where the OS is compromised.
>
> OK, so let's go back.  You are saying there are no security benefits to
> this vs. file system encryption.

I'm not sure that I can see any, myself..  Perhaps I'm wrong there, but
it seems unlikely that this would be an improvement over filesystem
level encryption in the general sense.  I'm not sure that I see it as
really worse than filesystem-level encryption either, to be clear.
There's a bit of increased information leakage, as Peter mentioned and I
agreed with, but it's not a lot and I expect in most cases that
information leak would be acceptable.  That seems like something which
would need to be considered on a case-by-case basis.

> The benefit is allowing configuration
> in the database rather than the OS?

No, the benefit is that the database administrator can configure it and
set it up and not have to get an OS-level administrator involved.  There
may also be other reasons why filesystem-level encryption is difficult
to set up or use in a certain environment, but this wouldn't depend on
anything OS-related and therefore could be done.

> You stated you can transfer
> db-level encrypted files between servers, but can't you do that anyway?

If the filesystem is encrypted and you wanted to transfer the entire
cluster from one system to another, keeping it encrypted with the same
key, you'd have to transfer the entire filesystem at a block level.
That's not typically very easy to do (ZFS, specifically, has this
capability where you can export a filesystem and send it from one
machine to another, but I don't know of any other filesystems which do
and ZFS isn't always an option..).

You could go through a process of re-encrypting the files prior to
transferring them, or deciding that simply having the transport
mechanism encrypted is sufficient (eg: SSH), but if what you really want
to do is keep the existing encryption of the database and transfer it to
another system, this allows that pretty easily.

For example, you could simply do:

cp -a /path/to/PG /mnt/usb

and you're done.  If you're using filesystem level encryption then you'd
have to re-encrypt the data, using something like:

tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar

And then you would need openssl on the other system to decrypt it.

Of course, either way you'd have to provide for a way to get the key
from one system to the other.

Also, tools like pg_basebackup could be used, with nearly zero changes,
I think, to get an encrypted copy of the database for backup purposes,
removing the need to work out a way to handle encrypting backups.

> Is the problem that you have to encrypt before sending and decrypt on
> arrival, if you don't trust the transmission link?  Is that used a lot?
> Is having the db encrypt every write a reasonable solution to that?

There's multiple use-cases here.  Making it easier to copy the database
is just one of them and it isn't the biggest one.  The biggest benefit
is that there's cases where filesystem-level encryption isn't really an
option or, even if it is, it's not desirable for other reasons.

> As far as future features, we don't have to add the all features at this
> time, but if someone has a good idea for an API and we can make it work
> easily while adding this feature, why wouldn't we do that?

Sure, I'm all for specific suggestions about how to improve the API, or
just in general recommendations of how to improve the patch.  The
suggestions which have been made about key management don't really come
across to me as specific API-level recommendations but rather "this
would also be nice to have" kind of comments, which isn't really the
same.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/13/17 15:20, Stephen Frost wrote:
> No, the benefit is that the database administrator can configure it and
> set it up and not have to get an OS-level administrator involved.  There
> may also be other reasons why filesystem-level encryption is difficult
> to set up or use in a certain environment, but this wouldn't depend on
> anything OS-related and therefore could be done.

Let's see a proposal in those terms then.  How easy can you make it,
compared to existing OS-level solutions, and will that justify the
maintenance overhead?

Considering how ubiquitous file-system encryption is, I have my doubts
that the trade-offs will come out right, but let's see.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/13/17 15:20, Stephen Frost wrote:
> For example, you could simply do: 
> 
> cp -a /path/to/PG /mnt/usb
> 
> and you're done.  If you're using filesystem level encryption then you'd
> have to re-encrypt the data, using something like:
> 
> tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar
> 
> And then you would need openssl on the other system to decrypt it.

Or make the USB file system encrypted as well?  If you're in that kind
of environment, that would surely be feasible, if not required.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 03:20:12PM -0400, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote:
> > > It's good to discuss what the feature would bring and what cases it
> > > doesn't cover, as well as discussing how it can be designed to make sure
> > > that later improvements are able to be done without having to change it
> > > around.  I do think it's a good idea for us to consider taking an
> > > incremental approach where we're adding pieces and building things up as
> > > we go.  I'm concerned that if we try to do too much in the initial
> > > implementation that we'll end up not having anything.
> > > 
> > > As it relates to the different attack vectors that this would address,
> > > it's primairly the same ones which filesystem-level encryption also
> > > addresses, but it's an improvement when it comes to ease of use.
> > > Unfortunately, it won't address cases where the OS is compromised.
> > 
> > OK, so let's go back.  You are saying there are no security benefits to
> > this vs. file system encryption.
> 
> I'm not sure that I can see any, myself..  Perhaps I'm wrong there, but
> it seems unlikely that this would be an improvement over filesystem
> level encryption in the general sense.  I'm not sure that I see it as
> really worse than filesystem-level encryption either, to be clear.
> There's a bit of increased information leakage, as Peter mentioned and I
> agreed with, but it's not a lot and I expect in most cases that
> information leak would be acceptable.  That seems like something which
> would need to be considered on a case-by-case basis.

Isn't the leakage controlled by OS permissions, so is it really leakage,
i.e., if you can see the leakage, you probably have bypassed the OS
permissions and see the key and data anyway.

> > The benefit is allowing configuration
> > in the database rather than the OS?
> 
> No, the benefit is that the database administrator can configure it and
> set it up and not have to get an OS-level administrator involved.  There
> may also be other reasons why filesystem-level encryption is difficult
> to set up or use in a certain environment, but this wouldn't depend on
> anything OS-related and therefore could be done.

OK, my only point here is that we are going down a slippery slope of
implementing OS things in the database.  There is nothing wrong with
that but it has often been something we have avoided, because of the
added complexity required in the db server.

As a counter-example, we only added an external collation library to
Postgres when we clearly saw a benefit, e.g. detecting collation
changes.

> > You stated you can transfer
> > db-level encrypted files between servers, but can't you do that anyway? 
> 
> If the filesystem is encrypted and you wanted to transfer the entire
> cluster from one system to another, keeping it encrypted with the same
> key, you'd have to transfer the entire filesystem at a block level.
> That's not typically very easy to do (ZFS, specifically, has this
> capability where you can export a filesystem and send it from one
> machine to another, but I don't know of any other filesystems which do
> and ZFS isn't always an option..).
>
> You could go through a process of re-encrypting the files prior to
> transferring them, or deciding that simply having the transport
> mechanism encrypted is sufficient (eg: SSH), but if what you really want
> to do is keep the existing encryption of the database and transfer it to
> another system, this allows that pretty easily.
> 
> For example, you could simply do: 
> 
> cp -a /path/to/PG /mnt/usb
> 
> and you're done.  If you're using filesystem level encryption then you'd
> have to re-encrypt the data, using something like:
> 
> tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar
> 
> And then you would need openssl on the other system to decrypt it.
> 
> Of course, either way you'd have to provide for a way to get the key
> from one system to the other.

Uh, doesn't scp do this?  I have trouble seeing how avoiding calling
openssl justifies changes to our database server.

> Also, tools like pg_basebackup could be used, with nearly zero changes,
> I think, to get an encrypted copy of the database for backup purposes,
> removing the need to work out a way to handle encrypting backups.

I do think we need much more documentation on how to encrypt things,
though that is a separate issue.  It might help to document how you
_should_ do things now to see the limitations we have.

> > Is the problem that you have to encrypt before sending and decrypt on
> > arrival, if you don't trust the transmission link?  Is that used a lot? 
> > Is having the db encrypt every write a reasonable solution to that?
> 
> There's multiple use-cases here.  Making it easier to copy the database
> is just one of them and it isn't the biggest one.  The biggest benefit
> is that there's cases where filesystem-level encryption isn't really an
> option or, even if it is, it's not desirable for other reasons.

I am thinking NAS storage you don't trust, though there is the leakage
there.

> > As far as future features, we don't have to add the all features at this
> > time, but if someone has a good idea for an API and we can make it work
> > easily while adding this feature, why wouldn't we do that?
> 
> Sure, I'm all for specific suggestions about how to improve the API, or
> just in general recommendations of how to improve the patch.  The
> suggestions which have been made about key management don't really come
> across to me as specific API-level recommendations but rather "this
> would also be nice to have" kind of comments, which isn't really the
> same.

They are related, or you can't even know that because you are/were
preventing the discussion from happening.  As an example, full-cluster
encryption doesn't really add any new capabilities, maybe just easier
deployment for some users.  Where things get useful is fine-grained
encryption, which file system-level encryption can't do.

Also, has anyone asked users if they would find db-encryption better
than file system encryption?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/13/17 15:51, Bruce Momjian wrote:
> Isn't the leakage controlled by OS permissions, so is it really leakage,
> i.e., if you can see the leakage, you probably have bypassed the OS
> permissions and see the key and data anyway.

One scenario (among many) is when you're done with the disk.  If the
content was fully encrypted, then you can just throw it into the trash
or have your provider dispose of it or reuse it.  If not, then,
depending on policy, you will have to physically obtain it and burn it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 04:08:29PM -0400, Peter Eisentraut wrote:
> On 6/13/17 15:51, Bruce Momjian wrote:
> > Isn't the leakage controlled by OS permissions, so is it really leakage,
> > i.e., if you can see the leakage, you probably have bypassed the OS
> > permissions and see the key and data anyway.
> 
> One scenario (among many) is when you're done with the disk.  If the
> content was fully encrypted, then you can just throw it into the trash
> or have your provider dispose of it or reuse it.  If not, then,
> depending on policy, you will have to physically obtain it and burn it.

Oh, I see your point --- db-level encryption stores the file system as
mountable on the device, while it is not with storage-level encryption
--- got it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
David Fetter
Date:
On Tue, Jun 13, 2017 at 10:28:14AM -0400, Peter Eisentraut wrote:
> On 6/13/17 09:24, Stephen Frost wrote:
> > but there are use-cases where it'd be really nice to be able to
> > have PG doing the encryption instead of the filesystem because
> > then you can do things like backup the database, copy it somewhere
> > else directly, and then restore it using the regular PG
> > mechanisms, as long as you have access to the key.  That's not
> > something you can directly do with filesystem-level encryption
> 
> Interesting point.
> 
> I wonder what the proper extent of "encryption at rest" should be.
> If you encrypt just on a file or block level, then someone looking
> at the data directory or a backup can still learn a number of things
> about the number of tables, transaction rates, various configuration
> settings, and so on.

In the end, information leaks at a strictly positive baud rate because
physics (cf. Claude Shannon, et al).

Encryption at rest is one technique whereby people can slow this rate,
but there's no such thing as getting it to zero.  Let's not creep this
feature in the ultimately futile attempt to do so.

> In the scenario of a sensitive application hosted on a shared
> SAN, I don't think that is good enough.
> 
> Also, in the use case you describe, if you use pg_basebackup to make a
> direct encrypted copy of a data directory, I think that would mean you'd
> have to keep using the same key for all copies.

Right.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 6/13/17 15:20, Stephen Frost wrote:
> > No, the benefit is that the database administrator can configure it and
> > set it up and not have to get an OS-level administrator involved.  There
> > may also be other reasons why filesystem-level encryption is difficult
> > to set up or use in a certain environment, but this wouldn't depend on
> > anything OS-related and therefore could be done.
>
> Let's see a proposal in those terms then.  How easy can you make it,
> compared to existing OS-level solutions, and will that justify the
> maintenance overhead?

From the original post on this thread, which included a WIP patch:

----------------------------------
Usage
=====

Set up database like so:
   (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;    export PGENCRYPTIONKEY    initdb -k -K pgcrypto $PGDATA
)

Start PostgreSQL:
   (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;    export PGENCRYPTIONKEY    postgres $PGDATA )
----------------------------------

That certainly seems very straight-forward to me, though I expect that
packagers would probably improve upon it further.

> Considering how ubiquitous file-system encryption is, I have my doubts
> that the trade-offs will come out right, but let's see.

There's definitely environments out there where DBAs aren't able to have
root privileges and that limits what they're able to do.  I'm not really
sure how to objectively weigh "you don't need to be root to encrypt the
database" vs. maintenance overhead of this feature.  Subjectively, for
my 2c anyway, it seems well worth it, but that's naturally subjective.
:)

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 6/13/17 15:20, Stephen Frost wrote:
> > And then you would need openssl on the other system to decrypt it.
>
> Or make the USB file system encrypted as well?  If you're in that kind
> of environment, that would surely be feasible, if not required.

Right, but requiring file system encryption to work on a USB stick
across different types of systems strikes me as actually a higher bar
than requiring openssl to exist on both the source and destination
sides.

Naturally, if the environment you're in has already solved that problem
across the enterprise then it's a good approach, although you might want
to use a different encryption key, perhaps, though hopefully that's
something you'd be able to do pretty easily too.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jun 13, 2017 at 03:20:12PM -0400, Stephen Frost wrote:
> > > OK, so let's go back.  You are saying there are no security benefits to
> > > this vs. file system encryption.
> >
> > I'm not sure that I can see any, myself..  Perhaps I'm wrong there, but
> > it seems unlikely that this would be an improvement over filesystem
> > level encryption in the general sense.  I'm not sure that I see it as
> > really worse than filesystem-level encryption either, to be clear.
> > There's a bit of increased information leakage, as Peter mentioned and I
> > agreed with, but it's not a lot and I expect in most cases that
> > information leak would be acceptable.  That seems like something which
> > would need to be considered on a case-by-case basis.
>
> Isn't the leakage controlled by OS permissions, so is it really leakage,
> i.e., if you can see the leakage, you probably have bypassed the OS
> permissions and see the key and data anyway.

The case I'm mainly considering is if you somehow lose control over the
medium in which the encrypted database resides- that is, someone steals
the hard drive, or perhaps the hard drive is sold without properly being
wiped first, things like that.

In such a case, there's no OS permissions to bypass because the OS is
now controlled by the attacker.  In that case, if the key wasn't stored
on the hard drive, then the attacker would be able to see the contents
of the filesystem and the associated metadata, but not the contents of
the cluster.

In that case, the distinction between filesystem-level encryption and
PG-level encryption is that with filesystem-level encryption the
attacker wouldn't be able to even see what files exist or any metadata
about them, whereas with PG-level encryption that information would be
available to the attacker.

In terms of an online attack where an attacker has gained access to the
system then, in general, you're right that if the attacker is able to
see into the PG data directory at all then they've figured out a way to
bypass the OS-level permissions and would then be able to see the data
directly anyway.  That's a different scenario which would most likely be
helped by something like SELinux being used, which could prevent the
attacker from being able to look at the PG data directory because the
attacker has connected to the system from a network which isn't allowed
to directly access those files.

> > > The benefit is allowing configuration
> > > in the database rather than the OS?
> >
> > No, the benefit is that the database administrator can configure it and
> > set it up and not have to get an OS-level administrator involved.  There
> > may also be other reasons why filesystem-level encryption is difficult
> > to set up or use in a certain environment, but this wouldn't depend on
> > anything OS-related and therefore could be done.
>
> OK, my only point here is that we are going down a slippery slope of
> implementing OS things in the database.  There is nothing wrong with
> that but it has often been something we have avoided, because of the
> added complexity required in the db server.

I'm not sure that I actually agree that encryption is really solely an
OS-level function, or even that encryption at rest is solely the OS's
job.  As a counter-example, I encrypt my SSH keys and GPG keys
routinely, even when I'm using OS-level filesystem encryption.  Perhaps
that's excessive of me, but, well, I don't find it so. :)

> As a counter-example, we only added an external collation library to
> Postgres when we clearly saw a benefit, e.g. detecting collation
> changes.

Right, and there's also the potential for adding more flexibility down
the road, which I'm certainly all for, but I see value in having even
this initial version of the feature too.

> > Of course, either way you'd have to provide for a way to get the key
> > from one system to the other.
>
> Uh, doesn't scp do this?  I have trouble seeing how avoiding calling
> openssl justifies changes to our database server.

scp may not be an option as it requires network connectivity between the
systems.  This is also just one of the use-cases, and not the main
reason, at least in my view, to add this feature.

> > Also, tools like pg_basebackup could be used, with nearly zero changes,
> > I think, to get an encrypted copy of the database for backup purposes,
> > removing the need to work out a way to handle encrypting backups.
>
> I do think we need much more documentation on how to encrypt things,
> though that is a separate issue.  It might help to document how you
> _should_ do things now to see the limitations we have.

Improving our documentation would certainly be good, but I'm not sure
that we can really recommend specific ways of doing things like
filesystem-level encryption, as that's really OS-dependent and there
could be trade-offs in different ways a given OS might provide that
capability.  I'm not sure that having our documentation generically say
"you should use filesystem-level encryption" would really be very
helpful.

Perhaps I misunderstood your suggestion here though..?

> > > Is the problem that you have to encrypt before sending and decrypt on
> > > arrival, if you don't trust the transmission link?  Is that used a lot?
> > > Is having the db encrypt every write a reasonable solution to that?
> >
> > There's multiple use-cases here.  Making it easier to copy the database
> > is just one of them and it isn't the biggest one.  The biggest benefit
> > is that there's cases where filesystem-level encryption isn't really an
> > option or, even if it is, it's not desirable for other reasons.
>
> I am thinking NAS storage you don't trust, though there is the leakage
> there.

Yes, NAS or SAN storage where you don't want the NAS/SAN administrators
to have access to the data is a good example of where encryption would
be useful.  Of course, either filesystem-level or PG-level encryption
would address that, but with the trade-off that PG-level encryption
would allow the NAS/SAN administrators to see the file metadata, as
discussed above.

> > > As far as future features, we don't have to add the all features at this
> > > time, but if someone has a good idea for an API and we can make it work
> > > easily while adding this feature, why wouldn't we do that?
> >
> > Sure, I'm all for specific suggestions about how to improve the API, or
> > just in general recommendations of how to improve the patch.  The
> > suggestions which have been made about key management don't really come
> > across to me as specific API-level recommendations but rather "this
> > would also be nice to have" kind of comments, which isn't really the
> > same.
>
> They are related, or you can't even know that because you are/were
> preventing the discussion from happening.  As an example, full-cluster
> encryption doesn't really add any new capabilities, maybe just easier
> deployment for some users.  Where things get useful is fine-grained
> encryption, which file system-level encryption can't do.

I agree that having fine-grained control would be helpful, but as I was
trying to get at, that's going to involve new catalog tables and SQL
syntax, and we'd still need to address how to encrypt the cluster-wide
files anyway, which is what this patch is working to do.

> Also, has anyone asked users if they would find db-encryption better
> than file system encryption?

I've been asked for this capability multiple times from our users and
have generally pushed back and encouraged filesystem-level encryption.
That hasn't always been an acceptable solution, unfortunately.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Aleksander Alekseev
Date:
Hi Ants,

On Tue, Jun 13, 2017 at 09:07:49AM -0400, Peter Eisentraut wrote:
> On 6/12/17 17:11, Ants Aasma wrote:
> > I'm curious if the community thinks this is a feature worth having?
> > Even considering that security experts would classify this kind of
> > encryption as a checkbox feature.
>
> File system encryption already exists and is well-tested.  I don't see
> any big advantages in re-implementing all of this one level up.  You
> would have to touch every single place in PostgreSQL backend and tool
> code where a file is being read or written.  Yikes.

I appreciate your work, but unfortunately I must agree with Peter.

On Linux you can configure the full disc encryption using LUKS /
dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using
geli [2]. In my personal opinion PostgreSQL is already complicated
enough. A few companies that hired system administrators that are too
lazy to read two or three man pages is not a reason to re-implement file
system encryption (or compression, or mirroring if that matters) in any
open source RDBMS.

[1] http://eax.me/dm-crypt/
[2] http://eax.me/freebsd-geli/

--
Best regards,
Aleksander Alekseev

Re: [HACKERS] WIP: Data at rest encryption

From
Kenneth Marshall
Date:
On Wed, Jun 14, 2017 at 12:04:26PM +0300, Aleksander Alekseev wrote:
> Hi Ants,
> 
> On Tue, Jun 13, 2017 at 09:07:49AM -0400, Peter Eisentraut wrote:
> > On 6/12/17 17:11, Ants Aasma wrote:
> > > I'm curious if the community thinks this is a feature worth having?
> > > Even considering that security experts would classify this kind of
> > > encryption as a checkbox feature.
> > 
> > File system encryption already exists and is well-tested.  I don't see
> > any big advantages in re-implementing all of this one level up.  You
> > would have to touch every single place in PostgreSQL backend and tool
> > code where a file is being read or written.  Yikes.
> 
> I appreciate your work, but unfortunately I must agree with Peter.
> 
> On Linux you can configure the full disc encryption using LUKS /
> dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using
> geli [2]. In my personal opinion PostgreSQL is already complicated
> enough. A few companies that hired system administrators that are too
> lazy to read two or three man pages is not a reason to re-implement file
> system encryption (or compression, or mirroring if that matters) in any
> open source RDBMS.
> 

Hi Aleksander,

While I agree that configuring full disk encryption is not technically
difficult, it requires much more privileged access to the system and
basically requires the support of a system administrator. In addition,
if a volume is not available for encryption, PostgreSQL support for
encryption would still allow for its data to be encrypted and as others
have mentioned can be enabled by the DBA alone.

Regards,
Ken



Re: [HACKERS] WIP: Data at rest encryption

From
Aleksander Alekseev
Date:
Hi Kenneth,

> > > File system encryption already exists and is well-tested.  I don't see
> > > any big advantages in re-implementing all of this one level up.  You
> > > would have to touch every single place in PostgreSQL backend and tool
> > > code where a file is being read or written.  Yikes.
> >
> > I appreciate your work, but unfortunately I must agree with Peter.
> >
> > On Linux you can configure the full disc encryption using LUKS /
> > dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using
> > geli [2]. In my personal opinion PostgreSQL is already complicated
> > enough. A few companies that hired system administrators that are too
> > lazy to read two or three man pages is not a reason to re-implement file
> > system encryption (or compression, or mirroring if that matters) in any
> > open source RDBMS.
>
> While I agree that configuring full disk encryption is not technically
> difficult, it requires much more privileged access to the system and
> basically requires the support of a system administrator. In addition,
> if a volume is not available for encryption, PostgreSQL support for
> encryption would still allow for its data to be encrypted and as others
> have mentioned can be enabled by the DBA alone.

Frankly I'm having difficulties imagining when it could be a real
problem. It doesn't seem to be such a burden to ask a colleague for
assistance in case you don't have sufficient permissions to do
something. And I got a strong feeling that solving bureaucracy issues of
specific organizations by changing PostgreSQL core in very invasive way
(keeping in mind testing, maintaining, etc) is misguided.

--
Best regards,
Aleksander Alekseev

Re: [HACKERS] WIP: Data at rest encryption

From
Aleksander Alekseev
Date:
> > While I agree that configuring full disk encryption is not technically
> > difficult, it requires much more privileged access to the system and
> > basically requires the support of a system administrator. In addition,
> > if a volume is not available for encryption, PostgreSQL support for
> > encryption would still allow for its data to be encrypted and as others
> > have mentioned can be enabled by the DBA alone.
>
> Frankly I'm having difficulties imagining when it could be a real
> problem. It doesn't seem to be such a burden to ask a colleague for
> assistance in case you don't have sufficient permissions to do
> something. And I got a strong feeling that solving bureaucracy issues of
> specific organizations by changing PostgreSQL core in very invasive way
> (keeping in mind testing, maintaining, etc) is misguided.

In the same time implementing a plugable storage API and then implementing
encrypted / compressed / whatever storage in a standalone extension using
this API seems to be a reasonable thing to do.

--
Best regards,
Aleksander Alekseev

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Wed, Jun 14, 2017 at 04:13:57PM +0300, Aleksander Alekseev wrote:
> > > While I agree that configuring full disk encryption is not technically
> > > difficult, it requires much more privileged access to the system and
> > > basically requires the support of a system administrator. In addition,
> > > if a volume is not available for encryption, PostgreSQL support for
> > > encryption would still allow for its data to be encrypted and as others
> > > have mentioned can be enabled by the DBA alone.
> > 
> > Frankly I'm having difficulties imagining when it could be a real
> > problem. It doesn't seem to be such a burden to ask a colleague for
> > assistance in case you don't have sufficient permissions to do
> > something. And I got a strong feeling that solving bureaucracy issues of
> > specific organizations by changing PostgreSQL core in very invasive way
> > (keeping in mind testing, maintaining, etc) is misguided.
> 
> In the same time implementing a plugable storage API and then implementing
> encrypted / compressed / whatever storage in a standalone extension using
> this API seems to be a reasonable thing to do. 

Agreed, good point.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Tue, Jun 13, 2017 at 06:29:20PM -0400, Stephen Frost wrote:
> > Isn't the leakage controlled by OS permissions, so is it really leakage,
> > i.e., if you can see the leakage, you probably have bypassed the OS
> > permissions and see the key and data anyway.
> 
> The case I'm mainly considering is if you somehow lose control over the
> medium in which the encrypted database resides- that is, someone steals
> the hard drive, or perhaps the hard drive is sold without properly being
> wiped first, things like that.
> 
> In such a case, there's no OS permissions to bypass because the OS is
> now controlled by the attacker.  In that case, if the key wasn't stored
> on the hard drive, then the attacker would be able to see the contents
> of the filesystem and the associated metadata, but not the contents of
> the cluster.
> 
> In that case, the distinction between filesystem-level encryption and
> PG-level encryption is that with filesystem-level encryption the
> attacker wouldn't be able to even see what files exist or any metadata
> about them, whereas with PG-level encryption that information would be
> available to the attacker.

Yes, Peter Eisentraut pointed that out in an earlier email in this
thread.  Thanks.

> > > > The benefit is allowing configuration
> > > > in the database rather than the OS?
> > > 
> > > No, the benefit is that the database administrator can configure it and
> > > set it up and not have to get an OS-level administrator involved.  There
> > > may also be other reasons why filesystem-level encryption is difficult
> > > to set up or use in a certain environment, but this wouldn't depend on
> > > anything OS-related and therefore could be done.
> > 
> > OK, my only point here is that we are going down a slippery slope of
> > implementing OS things in the database.  There is nothing wrong with
> > that but it has often been something we have avoided, because of the
> > added complexity required in the db server.
> 
> I'm not sure that I actually agree that encryption is really solely an
> OS-level function, or even that encryption at rest is solely the OS's
> job.  As a counter-example, I encrypt my SSH keys and GPG keys
> routinely, even when I'm using OS-level filesystem encryption.  Perhaps
> that's excessive of me, but, well, I don't find it so. :)

Well, I think SSH and GPG keys are a case of selective encryption, which
is where I said database encryption could really be a win, because you
can't do that outside the database.  Just to go crazy, here is something
I think would be cool for a fully or partially encrypted data row:
row data encrypted with symmetric key kk encrypted with public key of user 1k encrypted with public key of user 2hash
ofprevious fields signed by insert user
 

The would allow the insert user complete control over who sees the row. 
The database administrator could corrupt the row or add/remove users,
but that would be detected by the hash signature being invalid.  This is
kind of like TLS bundled in the database.  I think this is actually all
possible now too.  :-)

> > As a counter-example, we only added an external collation library to
> > Postgres when we clearly saw a benefit, e.g. detecting collation
> > changes.
> 
> Right, and there's also the potential for adding more flexibility down
> the road, which I'm certainly all for, but I see value in having even
> this initial version of the feature too.

Understood.

> > > Also, tools like pg_basebackup could be used, with nearly zero changes,
> > > I think, to get an encrypted copy of the database for backup purposes,
> > > removing the need to work out a way to handle encrypting backups.
> > 
> > I do think we need much more documentation on how to encrypt things,
> > though that is a separate issue.  It might help to document how you
> > _should_ do things now to see the limitations we have.
> 
> Improving our documentation would certainly be good, but I'm not sure
> that we can really recommend specific ways of doing things like
> filesystem-level encryption, as that's really OS-dependent and there
> could be trade-offs in different ways a given OS might provide that
> capability.  I'm not sure that having our documentation generically say
> "you should use filesystem-level encryption" would really be very
> helpful.
> 
> Perhaps I misunderstood your suggestion here though..?

I was just throwing out the idea that sometimes writing things down
shows the gaps in our feature-set --- it might not apply here.

> > > > Is the problem that you have to encrypt before sending and decrypt on
> > > > arrival, if you don't trust the transmission link?  Is that used a lot? 
> > > > Is having the db encrypt every write a reasonable solution to that?
> > > 
> > > There's multiple use-cases here.  Making it easier to copy the database
> > > is just one of them and it isn't the biggest one.  The biggest benefit
> > > is that there's cases where filesystem-level encryption isn't really an
> > > option or, even if it is, it's not desirable for other reasons.
> > 
> > I am thinking NAS storage you don't trust, though there is the leakage
> > there.
> 
> Yes, NAS or SAN storage where you don't want the NAS/SAN administrators
> to have access to the data is a good example of where encryption would
> be useful.  Of course, either filesystem-level or PG-level encryption
> would address that, but with the trade-off that PG-level encryption
> would allow the NAS/SAN administrators to see the file metadata, as
> discussed above.

Uh, as far as I understand it, SAN could technically use encryption
because you are sending blocks to the network storage and you could
encrypt the blocks before transfer.  However, I don't think that would
work for NAS/NFS.

> > Also, has anyone asked users if they would find db-encryption better
> > than file system encryption?
> 
> I've been asked for this capability multiple times from our users and
> have generally pushed back and encouraged filesystem-level encryption.
> That hasn't always been an acceptable solution, unfortunately.

Yes, it would be good to know how often that happens, and whether we
should be adjusting Postgres to address it.  The idea posted of using
plugin storage for encryption seems like a cool idea, and compression
and stuff could be added.  (However, I realize encryption and compression
doesn't work well because of information leakage.)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Ants Aasma
Date:
On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Of course, what would be even more useful is fine-grained encryption -
> encrypt these tables (and the corresponding indexes, toast tables, and
> WAL records related to any of that) with this key, encrypt these other
> tables (and the same list of associated stuff) with this other key,
> and leave the rest unencrypted.  The problem with that is that you
> probably can't run recovery without all of the keys, and even on a
> clean startup there would be a good deal of engineering work involved
> in refusing access to tables whose key hadn't been provided yet.

That's pretty much the reason why I decided to skip anything more
complicated for now. Anything that would be able to run recovery
without knowing the encryption key looks like an order of magnitude
more complicated to implement.

> Performance is likely to be poor on large databases,
> because every time a page transits between shared_buffers and the
> buffer cache we've got to en/decrypt, but as long as it's only poor
> for the people who opt into the feature I don't see a big problem with
> that.

It would make sense to tune the database with large shared buffers if
encryption is enabled. That should make sure that most shared buffers
traffic is going to disk anyway. As for performance, I have a
prototype assembly implementation of AES that does 3GB/s/core on my
laptop. If we add that behind a CPUID check the overhead should be
quite reasonable.

> I anticipate that one of the trickier problems here will be handling
> encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> a time.  In the current system, once you've written and flushed a
> block, you can consider it durably committed, but if that block is
> encrypted, this is no longer true.  A crash might tear the block,
> making it impossible to decrypt.  Replay will therefore stop at the
> end of the previous block, not at the last record actually flushed as
> would happen today.

My patch is currenly doing a block at a time for WAL. The XTS mode
used to encrypt has the useful property that blocks that share
identical prefix unencrypted also have identical prefix when
encrypted. It requires that the tearing is 16B aligned, but I think
that is true for pretty much all storage systems. That property of
course has security downsides, but for table/index storage we have a
nonce in the form of LSN in the page header eliminating the issue.

> So, your synchronous_commit suddenly isn't.  A
> similar problem will occur any other page where we choose not to
> protect against torn pages using full page writes.  For instance,
> unless checksums are enabled or wal_log_hints=on, we'll write a data
> page where a single bit has been flipped and assume that the bit will
> either make it to disk or not; the page can't really be torn in any
> way that hurts us.  But with encryption that's no longer true, because
> the hint bit will turn into much more than a single bit flip, and
> rereading that page with half old and half new contents will be the
> end of the world (TM).  I don't know off-hand whether we're
> protecting, say, CLOG page writes with FPWs.: because setting a couple
> of bits is idempotent and doesn't depend on the existing page
> contents, we might not need it currently, but with encryption, every
> bit in the page depends on every other bit in the page, so we
> certainly would.  I don't know how many places we've got assumptions
> like this baked into the system, but I'm guessing there are a bunch.

I think we need to require wal_log_hints=on when encryption is
enabled. Currently I have not considered tearing on CLOG bits. Other
SLRUs probably have similar issues. I need to think a bit about how to
solve that.

Regards,
Ants Aasma



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Wed, Jun 14, 2017 at 06:10:32PM +0300, Ants Aasma wrote:
> On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > Performance is likely to be poor on large databases,
> > because every time a page transits between shared_buffers and the
> > buffer cache we've got to en/decrypt, but as long as it's only poor
> > for the people who opt into the feature I don't see a big problem with
> > that.
> 
> It would make sense to tune the database with large shared buffers if
> encryption is enabled. That should make sure that most shared buffers
> traffic is going to disk anyway. As for performance, I have a
> prototype assembly implementation of AES that does 3GB/s/core on my
> laptop. If we add that behind a CPUID check the overhead should be
> quite reasonable.

Are you checking the CPU type or if AES instructions are enabled on the
CPU? I ask this because I just realized in researching my new TLS talk
that my BIOS defaults to AES instructions disabled, and I had to
manually enable it.

> > I anticipate that one of the trickier problems here will be handling
> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> > a time.  In the current system, once you've written and flushed a
> > block, you can consider it durably committed, but if that block is
> > encrypted, this is no longer true.  A crash might tear the block,
> > making it impossible to decrypt.  Replay will therefore stop at the
> > end of the previous block, not at the last record actually flushed as
> > would happen today.
> 
> My patch is currenly doing a block at a time for WAL. The XTS mode

Uh, how are you writing partial writes to the WAL.  I assume you are
doing a streaming cipher so you can write in increments, right?

> used to encrypt has the useful property that blocks that share
> identical prefix unencrypted also have identical prefix when
> encrypted. It requires that the tearing is 16B aligned, but I think
> that is true for pretty much all storage systems. That property of
> course has security downsides, but for table/index storage we have a
> nonce in the form of LSN in the page header eliminating the issue.
> 
> > So, your synchronous_commit suddenly isn't.  A
> > similar problem will occur any other page where we choose not to
> > protect against torn pages using full page writes.  For instance,
> > unless checksums are enabled or wal_log_hints=on, we'll write a data
> > page where a single bit has been flipped and assume that the bit will
> > either make it to disk or not; the page can't really be torn in any
> > way that hurts us.  But with encryption that's no longer true, because
> > the hint bit will turn into much more than a single bit flip, and
> > rereading that page with half old and half new contents will be the
> > end of the world (TM).  I don't know off-hand whether we're
> > protecting, say, CLOG page writes with FPWs.: because setting a couple
> > of bits is idempotent and doesn't depend on the existing page
> > contents, we might not need it currently, but with encryption, every
> > bit in the page depends on every other bit in the page, so we
> > certainly would.  I don't know how many places we've got assumptions
> > like this baked into the system, but I'm guessing there are a bunch.
> 
> I think we need to require wal_log_hints=on when encryption is
> enabled. Currently I have not considered tearing on CLOG bits. Other
> SLRUs probably have similar issues. I need to think a bit about how to
> solve that.

I am not sure if clog even needs to be encrypted.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Ants Aasma
Date:
On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Are you checking the CPU type or if AES instructions are enabled on the
> CPU? I ask this because I just realized in researching my new TLS talk
> that my BIOS defaults to AES instructions disabled, and I had to
> manually enable it.

There is zero code for that now, but the plan was to check the CPUID
instruction. My understanding is that it should report what is
currently enabled on the CPU. Will double check when actually writing
the code for the check.

>> > I anticipate that one of the trickier problems here will be handling
>> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
>> > a time.  In the current system, once you've written and flushed a
>> > block, you can consider it durably committed, but if that block is
>> > encrypted, this is no longer true.  A crash might tear the block,
>> > making it impossible to decrypt.  Replay will therefore stop at the
>> > end of the previous block, not at the last record actually flushed as
>> > would happen today.
>>
>> My patch is currenly doing a block at a time for WAL. The XTS mode
>
> Uh, how are you writing partial writes to the WAL.  I assume you are
> doing a streaming cipher so you can write in increments, right?

We were doing 8kB page aligned writes to WAL anyway. I just encrypt
the block before it gets written out.

>> I think we need to require wal_log_hints=on when encryption is
>> enabled. Currently I have not considered tearing on CLOG bits. Other
>> SLRUs probably have similar issues. I need to think a bit about how to
>> solve that.
>
> I am not sure if clog even needs to be encrypted.

Me neither, but it currently is, and it looks like that's broken in a
"silently corrupts your data" way in face of torn writes. Using OFB
mode (xor plaintext with pseudorandom stream for cipher) looks like it
might help here, if other approaches fail.

Regards,
Ants Aasma



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Wed, Jun 14, 2017 at 06:41:43PM +0300, Ants Aasma wrote:
> On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Are you checking the CPU type or if AES instructions are enabled on the
> > CPU? I ask this because I just realized in researching my new TLS talk
> > that my BIOS defaults to AES instructions disabled, and I had to
> > manually enable it.
> 
> There is zero code for that now, but the plan was to check the CPUID
> instruction. My understanding is that it should report what is
> currently enabled on the CPU. Will double check when actually writing
> the code for the check.

Just for specifics, I have two Intel Xeon CPU E5620, but the AES
instructions were disabled for this CPU since 2012 when I bought it. 
:-(  The good news is that only recently have I forced https in some
pages so this is the first time I heavily need it.  I now have a boot
test, which returns 16:
grep -c '\<aes\>' /proc/cpuinfo

> >> > I anticipate that one of the trickier problems here will be handling
> >> > encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> >> > a time.  In the current system, once you've written and flushed a
> >> > block, you can consider it durably committed, but if that block is
> >> > encrypted, this is no longer true.  A crash might tear the block,
> >> > making it impossible to decrypt.  Replay will therefore stop at the
> >> > end of the previous block, not at the last record actually flushed as
> >> > would happen today.
> >>
> >> My patch is currently doing a block at a time for WAL. The XTS mode
> >
> > Uh, how are you writing partial writes to the WAL.  I assume you are
> > doing a streaming cipher so you can write in increments, right?
> 
> We were doing 8kB page aligned writes to WAL anyway. I just encrypt
> the block before it gets written out.

Oh, we do.  The beauty of streaming ciphers built on block ciphers is
that you can pre-compute the cipher to be XOR'ed with the data because
the block cipher output doesn't depend on the user data.  This is used
for SSH, for example.

> >> I think we need to require wal_log_hints=on when encryption is
> >> enabled. Currently I have not considered tearing on CLOG bits. Other
> >> SLRUs probably have similar issues. I need to think a bit about how to
> >> solve that.
> >
> > I am not sure if clog even needs to be encrypted.
> 
> Me neither, but it currently is, and it looks like that's broken in a
> "silently corrupts your data" way in face of torn writes. Using OFB
> mode (xor plaintext with pseudorandom stream for cipher) looks like it
> might help here, if other approaches fail.

I would just document the limitation and move on.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/14/17 05:04, Aleksander Alekseev wrote:
> A few companies that hired system administrators that are too
> lazy to read two or three man pages is not a reason to re-implement file
> system encryption (or compression, or mirroring if that matters) in any
> open source RDBMS.

To be fair, we did implement our own compression (TOAST) and mirroring
(compared to, say, DRBD) because there were clear advantages in
simplicity and performance.  Checksumming is another area where we moved
forward in spite of arguments that the file system should do it.  So we
will need to see the whole picture.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/13/17 18:11, Stephen Frost wrote:
>> Let's see a proposal in those terms then.  How easy can you make it,
>> compared to existing OS-level solutions, and will that justify the
>> maintenance overhead?
> From the original post on this thread, which included a WIP patch:
> 
> ----------------------------------
> Usage
> =====
> 
> Set up database like so:
> 
>     (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
>      export PGENCRYPTIONKEY
>      initdb -k -K pgcrypto $PGDATA )
> 
> Start PostgreSQL:
> 
>     (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
>      export PGENCRYPTIONKEY
>      postgres $PGDATA )
> ----------------------------------

Relying on environment variables is clearly pretty crappy.  So if that's
the proposal, then I think it needs to be better.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Peter,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 6/13/17 18:11, Stephen Frost wrote:
> >> Let's see a proposal in those terms then.  How easy can you make it,
> >> compared to existing OS-level solutions, and will that justify the
> >> maintenance overhead?
> > From the original post on this thread, which included a WIP patch:
> >
> > ----------------------------------
> > Usage
> > =====
> >
> > Set up database like so:
> >
> >     (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
> >      export PGENCRYPTIONKEY
> >      initdb -k -K pgcrypto $PGDATA )
> >
> > Start PostgreSQL:
> >
> >     (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
> >      export PGENCRYPTIONKEY
> >      postgres $PGDATA )
> > ----------------------------------
>
> Relying on environment variables is clearly pretty crappy.  So if that's
> the proposal, then I think it needs to be better.

I don't believe that was ever intended to be the final solution, I was
just pointing out that it's what the WIP patch did.

The discussion had moved into having a command called which provided the
key on stdout, as I recall, allowing it to be whatever the user wished,
including binary of any kind.

If you have other suggestions, I'm sure they would be well received.  As
to the question of complexity, it certainly looks like it'll probably be
quite straight-forward for users to use.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Wed, Jun 14, 2017 at 5:41 PM, Stephen Frost <sfrost@snowman.net> wrote:
> I don't believe that was ever intended to be the final solution, I was
> just pointing out that it's what the WIP patch did.
>
> The discussion had moved into having a command called which provided the
> key on stdout, as I recall, allowing it to be whatever the user wished,
> including binary of any kind.
>
> If you have other suggestions, I'm sure they would be well received.  As
> to the question of complexity, it certainly looks like it'll probably be
> quite straight-forward for users to use.

To me, this reads a bit like you're still trying to shut down the
discussion here.  Perhaps I am misreading it.

Upthread, you basically said that we shouldn't talk about key
management (specifically, you said, "Key management is an entirely
independent discussion from this") which I think is a ridiculous
statement.  We have to have some kind of simple key management in
order to have the feature at all.  It does not have to be crazy
complicated, but it has to exist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Peter Eisentraut
Date:
On 6/14/17 17:41, Stephen Frost wrote:
>> Relying on environment variables is clearly pretty crappy.  So if that's
>> the proposal, then I think it needs to be better.
> I don't believe that was ever intended to be the final solution, I was
> just pointing out that it's what the WIP patch did.
> 
> The discussion had moved into having a command called which provided the
> key on stdout, as I recall, allowing it to be whatever the user wished,
> including binary of any kind.
> 
> If you have other suggestions, I'm sure they would be well received.  As
> to the question of complexity, it certainly looks like it'll probably be
> quite straight-forward for users to use.

I think the passphrase entry part of the problem is actually a bit
harder than it appears.

Making this work well would be a major part of the usability story that
this is being sold on.  If the proposed solution is that you can cobble
together a few bits of shell, then not only is that not very
user-friendly, it also won't work consistently across platforms, won't
work under systemd (launchd? Windows service?), and might behave
awkwardly under restricted environments where there is no terminal or
only a limited OS environment.  Moreover, it leaves the security aspects
of that part of the solution (keys lingering in memory or in swap) up to
the user.

There was a discussion a while ago about how to handle passphrase entry
for SSL keys.  The conclusion was that it works pretty crappily right
now, and several suggestions for improvement were discussed.  I suggest
that fixing that properly and with flexibility could also yield a
solution for encryption key entry.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Thu, Jun 15, 2017 at 12:06 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Making this work well would be a major part of the usability story that
> this is being sold on.  If the proposed solution is that you can cobble
> together a few bits of shell, then not only is that not very
> user-friendly, it also won't work consistently across platforms, won't
> work under systemd (launchd? Windows service?), and might behave
> awkwardly under restricted environments where there is no terminal or
> only a limited OS environment.  Moreover, it leaves the security aspects
> of that part of the solution (keys lingering in memory or in swap) up to
> the user.
>
> There was a discussion a while ago about how to handle passphrase entry
> for SSL keys.  The conclusion was that it works pretty crappily right
> now, and several suggestions for improvement were discussed.  I suggest
> that fixing that properly and with flexibility could also yield a
> solution for encryption key entry.

That sounds like a good idea to me.

However, I'd like to disagree with the idea that key management is the
primary way in which this feature would improve usability.  To me, the
big advantage is that you don't need to be root (also, we can have
more consistency in behavior across operating systems).  I disagree
vigorously with the idea that anyone who wants to encrypt their
PostgreSQL database should just get root privileges on the system and
use an encrypted filesystem.  In some environments, "just get root
privileges" is not something that is very easy to do; but even if you
*have* root privileges, you don't necessarily want to have to use them
just to install and configure your database.

Right now, I can compile and install PostgreSQL from my own user
account, run initdb, and start it up.  Surely everyone PostgreSQL
developer in the world - and a good number of users - would agree that
if I suddenly needed to run initdb as root, that would be a huge
usability regression.  You wouldn't even be able to run 'make check'
without root privileges, which would suck.  In the same way, when we
removed (for most users) the need to tune System V shared memory
parameters (b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67), a lot of users
were very happy precisely because it eliminated a setup step that
formerly had to be done as root.  It is very reasonable to suppose
that users who need encryption will similarly be happy if they no
longer need to be root to get an encrypted PostgreSQL working.  Sure,
that's only a subset of users rather than all of them, but it's the
same kind of issue.

Also, I don't think we should be presenting filesystem encryption and
built-in encryption as if they were two methods of solving the exact
same problem.  In some scenarios, they are in fact solving the same
problem.  However, it's entirely reasonable to want to use both.  For
example, the hard disk in my laptop is encrypted, because that's a
thing Apple does.  If somebody steals the hard disk out of my laptop,
they may have some difficulty recovering the contents. However, as
Stephen also mentioned, that has not deterred me from putting a
passphrase on my SSH keys.  There are situations in which the
passphrase provides protection that the whole-drive encryption won't.
For example, if I copy my home directory onto a USB stick and then
copy it from there to a new laptop, somebody might steal the USB
stick.  If they manage to do that, they will get most of my files, but
they won't get my SSH keys, or at least not without guessing my
passphrase.  Similarly, if I foolishly walk away from my laptop in the
presence of some nefarious person (say, Magnus) without locking it,
that person can't steal my keys.  That person might be able to
impersonate me for as long as I'm away from the laptop, if the keys
are loaded into my SSH agent, but not afterwards.  The issues for the
database are similar.  You might want one of these things or the other
or both depending on the exact situation.

To be honest, I find the hostility toward this feature a bit baffling.
The argument seems to be essentially that we shouldn't have this
feature because we'd have to maintain the code and many of the same
goals could be accomplished by using facilities that already exist
outside the database server.  But that's also true for parallel query
(cf. Stado), logical replication (cf. Slony, Bucardo, Londiste),
physical replication (cf. DRBD), partitioning (cf. pg_partman), RLS
(cf. veil), and anything that could be written as application logic
(eg. psql's \if ... \endif, every procedural language we have,
user-defined functions themselves, database-enforced constraints,
FDWs).  Yet, in every one of those cases, we find it worthwhile to
have the feature because it works better and is easier to use when
it's built in.  I don't think that a patch for this feature is likely
to be bigger than (or even as large as) the patches for logical
replication or parallel query, and it will probably be less work to
maintain going forward than either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 03:09:32PM -0400, Robert Haas wrote:
> To be honest, I find the hostility toward this feature a bit baffling.
> The argument seems to be essentially that we shouldn't have this
> feature because we'd have to maintain the code and many of the same
> goals could be accomplished by using facilities that already exist
> outside the database server.  But that's also true for parallel query
> (cf. Stado), logical replication (cf. Slony, Bucardo, Londiste),
> physical replication (cf. DRBD), partitioning (cf. pg_partman), RLS
> (cf. veil), and anything that could be written as application logic
> (eg. psql's \if ... \endif, every procedural language we have,
> user-defined functions themselves, database-enforced constraints,
> FDWs).  Yet, in every one of those cases, we find it worthwhile to
> have the feature because it works better and is easier to use when
> it's built in.  I don't think that a patch for this feature is likely
> to be bigger than (or even as large as) the patches for logical
> replication or parallel query, and it will probably be less work to
> maintain going forward than either.

I think the big win for having OS features in the database is
selectivity --- the ability to selectively apply a feature to part of
the database.  This is what you are doing by putting a password on your
SSH key, and my idea about row encryption.

It is also a question of convenience.  If SSH told users they have to
create an encrypted volume to store their SSH keys with a password, it
would be silly, since the files are so small compared to a file system. 
I think the assumption is that any security-concerned deployment of
Postgres will already have Postgres on its own partition and have the
root administrator involved.  I think it is this assumption that drives
the idea that requiring root to run Postgres doesn't make sense, but it
does to do encryption.  Also, there is the sense that security requires
trust of the root user, while using Postgres doesn't require the root
user to also use Postgres.

One serious difference between in-database-encryption and SSH keys is
that the use of passwords for SSH is well understood and reasonable to
use, while I think we all admit that use of passwords for database
objects like SSL keys is murky.  Use of keys for OS-level encryption is
a little better handled, but not as clean as SSH keys.

I admit there is no hard line here, so I guess we will have to see what
the final patch looks like.  I am basing my statements on what I guess
the complexity will be.  Complexity has a cost so we will have to weigh
it when we see it.  When SSH added password access, it was probably an
easy decision because the use-case was high and the complexity was low.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Thu, Jun 15, 2017 at 4:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
> I think the big win for having OS features in the database is
> selectivity --- the ability to selectively apply a feature to part of
> the database.  This is what you are doing by putting a password on your
> SSH key, and my idea about row encryption.

I agree. I think we will eventually want to be able to apply
encryption selectively, although I don't think we need to have that
feature in a first patch.  One problem is that if you don't encrypt
the WAL, there's not much point in encrypting the table data, so it
becomes tricky to encrypt some things and not others.  However, I am
sure we can eventually solve those problems, given enough time and
development effort.

> It is also a question of convenience.  If SSH told users they have to
> create an encrypted volume to store their SSH keys with a password, it
> would be silly, since the files are so small compared to a file system.
> I think the assumption is that any security-concerned deployment of
> Postgres will already have Postgres on its own partition and have the
> root administrator involved.  I think it is this assumption that drives
> the idea that requiring root to run Postgres doesn't make sense, but it
> does to do encryption.

I don't think that's a particularly good assumption, though,
especially with the proliferation of virtual and containerized
environments where access to root privileges tends to be more
circumscribed.  Also, there are a lot of small databases out there
that you might want to be able to encrypt without encrypting
everything on the filesystem.  For example, there are products that
embed PostgreSQL.  If a particular PostgreSQL configuration requires
root access, then using that configuration means that the installing
the product which contains it also requires root access.  Installing
the product means changing /etc/fstab, and uninstalling it means
reversing those changes.  That's very awkward.  I agree that if you've
got a terabyte of sensitive data, you probably want to encrypt the
filesystem and involve the DBA, but there are still people who have a
gigabyte of sensitive data.  For those people, a separate filesystem
likely doesn't make sense, but they may still want encryption.

> Also, there is the sense that security requires
> trust of the root user, while using Postgres doesn't require the root
> user to also use Postgres.

I don't understand this.  It is certainly true that you're running
binaries owned by root, the root user could Trojan the binaries and
break any security you think you have.  But that problem is no better
or worse for PostgreSQL than anything else.

> One serious difference between in-database-encryption and SSH keys is
> that the use of passwords for SSH is well understood and reasonable to
> use, while I think we all admit that use of passwords for database
> objects like SSL keys is murky.  Use of keys for OS-level encryption is
> a little better handled, but not as clean as SSH keys.

Peter pointed out upthread that our handling of SSL passphrases leaves
a lot to be desired, and that maybe we should fix that problem first;
I agree.  But I don't think this is any kind of intrinsic limitation
of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a
quality-of-implementation issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 05:04:17PM -0400, Robert Haas wrote:
> > Also, there is the sense that security requires
> > trust of the root user, while using Postgres doesn't require the root
> > user to also use Postgres.
> 
> I don't understand this.  It is certainly true that you're running
> binaries owned by root, the root user could Trojan the binaries and
> break any security you think you have.  But that problem is no better
> or worse for PostgreSQL than anything else.

I couldn't find a cleaner way to see it --- it is that database use
doesn't involve the root user using it, while database security requires
the root user to also be security-conscious.

> > One serious difference between in-database-encryption and SSH keys is
> > that the use of passwords for SSH is well understood and reasonable to
> > use, while I think we all admit that use of passwords for database
> > objects like SSL keys is murky.  Use of keys for OS-level encryption is
> > a little better handled, but not as clean as SSH keys.
> 
> Peter pointed out upthread that our handling of SSL passphrases leaves
> a lot to be desired, and that maybe we should fix that problem first;
> I agree.  But I don't think this is any kind of intrinsic limitation
> of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a
> quality-of-implementation issue.

I think there are environmental issues that make password use on SSH
easier than the other cases --- it isn't just code quality.  However, it
would be good to research how SSH handles it to see if we can get any
ideas.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jun 15, 2017 at 05:04:17PM -0400, Robert Haas wrote:
> > > Also, there is the sense that security requires
> > > trust of the root user, while using Postgres doesn't require the root
> > > user to also use Postgres.
> >
> > I don't understand this.  It is certainly true that you're running
> > binaries owned by root, the root user could Trojan the binaries and
> > break any security you think you have.  But that problem is no better
> > or worse for PostgreSQL than anything else.
>
> I couldn't find a cleaner way to see it --- it is that database use
> doesn't involve the root user using it, while database security requires
> the root user to also be security-conscious.

I tend to agree with Robert here (in general).  Further, there are
certainly environments where the administrator with root access is
absolutely security conscious, but that doesn't mean that the DBA has
easy access to the root account or to folks who have that level of
access and it's much easier for the DBA if they're able to address all
of their requirements by building PG themselves, installing it, and,
ideally, encrypting the DB.

> > > One serious difference between in-database-encryption and SSH keys is
> > > that the use of passwords for SSH is well understood and reasonable to
> > > use, while I think we all admit that use of passwords for database
> > > objects like SSL keys is murky.  Use of keys for OS-level encryption is
> > > a little better handled, but not as clean as SSH keys.
> >
> > Peter pointed out upthread that our handling of SSL passphrases leaves
> > a lot to be desired, and that maybe we should fix that problem first;
> > I agree.  But I don't think this is any kind of intrinsic limitation
> > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a
> > quality-of-implementation issue.

I'm not thrilled with asking Ants to implement a solution to SSL
passphrases, and generalizing it to work for this, to get this feature
accepted.  I assume that the reason for asking for that work to be done
now is because we decided that the current approach for SSL sucks but we
couldn't actually drop support for it, but we don't want to add other
features which work in a similar way because, well, it sucks.

I get that.  I'm not thrilled with it, but I get it.  I'm hopeful it
ends up not being too bad, but if it ends up meaning we don't get this
feature, then I'll reconsider my position about agreeing that it's an
acceptable requirement.

> I think there are environmental issues that make password use on SSH
> easier than the other cases --- it isn't just code quality.  However, it
> would be good to research how SSH handles it to see if we can get any
> ideas.

Actually, the approach SSH uses is a really good one, imv, and one which
we might be able to leverage..  I'm not sure though.  I will say that,
in general, I like the idea of leveraging the external libraries which
handle keys and deal with encryption to make this happen as those allow
things like hardware devices to hold the key and possibly perform the
encryption/decryption, etc.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 06:41:08PM -0400, Stephen Frost wrote:
> > > > One serious difference between in-database-encryption and SSH keys is
> > > > that the use of passwords for SSH is well understood and reasonable to
> > > > use, while I think we all admit that use of passwords for database
> > > > objects like SSL keys is murky.  Use of keys for OS-level encryption is
> > > > a little better handled, but not as clean as SSH keys.
> > > 
> > > Peter pointed out upthread that our handling of SSL passphrases leaves
> > > a lot to be desired, and that maybe we should fix that problem first;
> > > I agree.  But I don't think this is any kind of intrinsic limitation
> > > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a
> > > quality-of-implementation issue.
> 
> I'm not thrilled with asking Ants to implement a solution to SSL
> passphrases, and generalizing it to work for this, to get this feature
> accepted.  I assume that the reason for asking for that work to be done
> now is because we decided that the current approach for SSL sucks but we
> couldn't actually drop support for it, but we don't want to add other
> features which work in a similar way because, well, it sucks.

My point is that if our support for db-level encryption is as bad as SSL
key passwords, then it will be nearly useless, so we might as well not
have it.  Isn't that obvious?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jun 15, 2017 at 06:41:08PM -0400, Stephen Frost wrote:
> > > > > One serious difference between in-database-encryption and SSH keys is
> > > > > that the use of passwords for SSH is well understood and reasonable to
> > > > > use, while I think we all admit that use of passwords for database
> > > > > objects like SSL keys is murky.  Use of keys for OS-level encryption is
> > > > > a little better handled, but not as clean as SSH keys.
> > > >
> > > > Peter pointed out upthread that our handling of SSL passphrases leaves
> > > > a lot to be desired, and that maybe we should fix that problem first;
> > > > I agree.  But I don't think this is any kind of intrinsic limitation
> > > > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a
> > > > quality-of-implementation issue.
> >
> > I'm not thrilled with asking Ants to implement a solution to SSL
> > passphrases, and generalizing it to work for this, to get this feature
> > accepted.  I assume that the reason for asking for that work to be done
> > now is because we decided that the current approach for SSL sucks but we
> > couldn't actually drop support for it, but we don't want to add other
> > features which work in a similar way because, well, it sucks.
>
> My point is that if our support for db-level encryption is as bad as SSL
> key passwords, then it will be nearly useless, so we might as well not
> have it.  Isn't that obvious?

Well, no, because the reason we even have an approach at all for SSL key
passwords is because multiple people (myself and Magnus, at least, as I
recall) complained as we are aware of installations which are actively
using that approach.

That doesn't mean it's a great solution or that it doesn't suck- really,
I tend to agree that it does, but it's necessary because we need a
solution, it works, and users are using it.  Having a better solution
would be great and something agent-based might be the right answer (or
perhaps something where we have support for using hardware accellerators
through an existing library...).

I expect the same would happen with the shell-command approach suggested
up-thread and the prompt-on-stdin approach too, they aren't great but I
expect users would still use the feature.  As Robert and I have
mentioned, there is a good bit of value to having this feature simply
because it avoids the need to get someone with root privileges to set up
an encrypted volume and I don't think having to use a shell command or
providing the password on stdin at startup really changes that very
much.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote:
> I expect the same would happen with the shell-command approach suggested
> up-thread and the prompt-on-stdin approach too, they aren't great but I
> expect users would still use the feature.  As Robert and I have
> mentioned, there is a good bit of value to having this feature simply
> because it avoids the need to get someone with root privileges to set up
> an encrypted volume and I don't think having to use a shell command or
> providing the password on stdin at startup really changes that very
> much.

Understood, but now you are promoting a feature with an admittedly-poor
API, duplication of an OS feature, and perhaps an invasive change to the
code.  Those are high hurdles.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote:
> > I expect the same would happen with the shell-command approach suggested
> > up-thread and the prompt-on-stdin approach too, they aren't great but I
> > expect users would still use the feature.  As Robert and I have
> > mentioned, there is a good bit of value to having this feature simply
> > because it avoids the need to get someone with root privileges to set up
> > an encrypted volume and I don't think having to use a shell command or
> > providing the password on stdin at startup really changes that very
> > much.
>
> Understood, but now you are promoting a feature with an admittedly-poor
> API, duplication of an OS feature, and perhaps an invasive change to the
> code.  Those are high hurdles.

Of those, the only one that worries me, at least, is that it might be an
invasive and difficult to maintain code change.  As Robert said, and I
agree with, "duplication of an OS feature" is something we pretty
routinly, and justifiably, do.  The poor interface is unfortunate, but
if it's consistent with what we have today for a similar feature then
I'm really not too upset with it.  If we can do better, great, I'm all
for that, but if not, then I'd rather have the feature with the poor
interface than not have it at all.

If it's an invasive code change or one which ends up being difficult to
maintain, then that's a problem.  Getting some focus on that aspect
would be great and I certainly appreciate Robert's initial review and
commentary on it.

Thanks!

Stephen

Re: [HACKERS] WIP: Data at rest encryption

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote:
> > I expect the same would happen with the shell-command approach suggested
> > up-thread and the prompt-on-stdin approach too, they aren't great but I
> > expect users would still use the feature.  As Robert and I have
> > mentioned, there is a good bit of value to having this feature simply
> > because it avoids the need to get someone with root privileges to set up
> > an encrypted volume and I don't think having to use a shell command or
> > providing the password on stdin at startup really changes that very
> > much.
> 
> Understood, but now you are promoting a feature with an admittedly-poor
> API, duplication of an OS feature, and perhaps an invasive change to the
> code.  Those are high hurdles.

I thought we called it "incremental development".  From the opposite
point of view, would you say we should ban use of passphrase-protected
SSL key files because the current user interface for them is bad?

I have no use for data-at-rest encryption myself, but I wouldn't stop
development just because the initial design proposal doesn't include
top-notch key management.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 07:51:36PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote:
> > > I expect the same would happen with the shell-command approach suggested
> > > up-thread and the prompt-on-stdin approach too, they aren't great but I
> > > expect users would still use the feature.  As Robert and I have
> > > mentioned, there is a good bit of value to having this feature simply
> > > because it avoids the need to get someone with root privileges to set up
> > > an encrypted volume and I don't think having to use a shell command or
> > > providing the password on stdin at startup really changes that very
> > > much.
> > 
> > Understood, but now you are promoting a feature with an admittedly-poor
> > API, duplication of an OS feature, and perhaps an invasive change to the
> > code.  Those are high hurdles.
> 
> I thought we called it "incremental development".  From the opposite
> point of view, would you say we should ban use of passphrase-protected
> SSL key files because the current user interface for them is bad?
> 
> I have no use for data-at-rest encryption myself, but I wouldn't stop
> development just because the initial design proposal doesn't include
> top-notch key management.

Yes, but we have to have a plan on how to improve it.  Why add a feature
that is hard to maintain, and hard to use.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Andres Freund
Date:
On 2017-06-15 19:44:43 -0400, Bruce Momjian wrote:
> Understood, but now you are promoting a feature with an admittedly-poor
> API, duplication of an OS feature, and perhaps an invasive change to the
> code.

*Perhaps* an invasive change to the code?  To me it's pretty evident
that this'll be a pretty costly feature from that angle. We've quite a
few places that manipulate on-disk files, and they'll all have to be
manipulated. Several of those are essentially critical sections, adding
memory allocations to them wouldn't be good, so we'll need
pre-allocation APIs.

I've only skimmed the discussion, but based on that I'm very surprised
how few concerns about this feature's complexity / maintainability
impact have been raised.

- Andres



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 04:56:36PM -0700, Andres Freund wrote:
> On 2017-06-15 19:44:43 -0400, Bruce Momjian wrote:
> > Understood, but now you are promoting a feature with an admittedly-poor
> > API, duplication of an OS feature, and perhaps an invasive change to the
> > code.
> 
> *Perhaps* an invasive change to the code?  To me it's pretty evident
> that this'll be a pretty costly feature from that angle. We've quite a
> few places that manipulate on-disk files, and they'll all have to be
> manipulated. Several of those are essentially critical sections, adding
> memory allocations to them wouldn't be good, so we'll need
> pre-allocation APIs.
> 
> I've only skimmed the discussion, but based on that I'm very surprised
> how few concerns about this feature's complexity / maintainability
> impact have been raised.

Yeah, I guess we will just have to wait to see it since other people are
excited about it.  My concern is code complexity and usability
challenges, vs punting the problem to the operating system, though
admittedly there are some cases where that is not possible.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Tatsuo Ishii
Date:
> Yeah, I guess we will just have to wait to see it since other people are
> excited about it.  My concern is code complexity and usability
> challenges, vs punting the problem to the operating system, though
> admittedly there are some cases where that is not possible.

Oracle sells this feature only with the expensive enterprise
edition. And people actually buy it. I guess the feature is pretty
important for some users.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] WIP: Data at rest encryption

From
Konstantin Knizhnik
Date:

On 16.06.2017 03:08, Bruce Momjian wrote:
>
> Yeah, I guess we will just have to wait to see it since other people are
> excited about it.  My concern is code complexity and usability
> challenges, vs punting the problem to the operating system, though
> admittedly there are some cases where that is not possible.
>

Let me also share my opinion about encryption and compression support at 
database level.
PostgresPro Enterprise does support both. I have made presentation about 
it at PgConn 2016 in Tallinn.
I was a little bit surprised that there were more questions about 
encryption than about compression.
But right now we have several customers which are using compression and 
none of them use encryption (just because them do not need
to protect their databases). But I absolutely sure that there are many 
Postgres users which first of all need to protect their data.

Encryption is much easier to implement than compression, because it is 
not changing page size. So I do not see any "complexity and flexibility 
challenges" here.
Just for reference I attached to this mail our own encryption patch. I 
do not want to propose it as alternative to Aasmas patch: it is less 
flexible and doesn't support encryption of WAL, just encryption of 
relation data. Also it doesn't allow custom encryption libraries: AES 
implementation is embedded. Encryption cipher is taken from environment 
variable. At Tallin's conferences I was informed about possible security 
issue with passing key through environment variable: it is possible to 
inspect server's environment variables using plpython/plperl stored 
procedure.
This is why we unset this environment variable after reading. I am not 
expect in security, but I do not know other issues with such solution.

Concerning the question whether to implement compression/encryption on 
database level or rely on OS, my opinion is that there are many 
scenarios where it is not possible or is not desirable to use OS level 
encryption/protection. It first of all includes cloud installations and 
embedded applications.  I do not want to repeat arguments already 
mentioned in this thread.
But the fact is that there are many people which really need 
compression/encryption support and them can not or do not want to 
redirect this aspects to OS. Almost all DBMSes are supporting 
compression encryption, so lack of this features in Postgres definitely 
can not be considered as Postgres advantage.

Postgres buffer manager interface significantly simplifies integration 
of encryption and compression. There is actually single path through 
which data is fetched/stored to the disk.
It is most obvious and natural solution to decompress/decrypt data when 
it is read from the disk to page pool and compress/encrypt it when it is 
written back. Taken in account that memory is cheap now and many 
databases can completely fit in memory, storing pages in the buffer 
cache in plain (decompressed/decrypted) format allows to minimize 
overhead of compression/encryption and its influence on performance. For 
read only queries working with cached data performance will be exactly 
the same as without encryption/compression.
Write speed for encrypted pages will be certainly slightly worse, but 
still encryption speed is much higher than disk IO speed.

So I do not think that it is really necessary to support encryption of 
some particular tables, storing "non-secrete" data in plain format 
without encryption. It should not cause noticeable  improve of 
performance, but may complicate implementation and increase possibility 
of leaking secure data.

I do not think that pluggable storage API is right approach to integrate 
compression and especially encryption. It is better to plugin encryption 
between buffer manager and storage device,
allowing to use  it with any storage implementation. Also it is not 
clear to me whether encryption of WAL can be provided using pluggable 
storage API.

The last discussed question is whether it is necessary to encrypt 
temporary data (BufFile). In our solution we encrypt only main fork of 
non-system relations and do no encrypt temporary relations. It may cause 
that some secrete data will be stored at this disk in non-encrypted 
format. But accessing this data is not trivial. You can not just 
copy/stole disk, open database and do "select * from SecreteTable": you 
will have to extract data from raw file yourself. So looks like it is 
better to allow user to make choice whether to encrypt temporary data or 
not.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Thu, Jun 15, 2017 at 08:08:05PM -0400, Bruce Momjian wrote:
> On Thu, Jun 15, 2017 at 04:56:36PM -0700, Andres Freund wrote:
> > how few concerns about this feature's complexity / maintainability
> > impact have been raised.
> 
> Yeah, I guess we will just have to wait to see it since other people are
> excited about it.  My concern is code complexity and usability
> challenges, vs punting the problem to the operating system, though
> admittedly there are some cases where that is not possible.

I know some OS's can create file systems inside files.  Can you encrypt
such file storage as non-root?  I assume that is just too odd.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Bruce Momjian
Date:
On Fri, Jun 16, 2017 at 11:06:39AM +0300, Konstantin Knizhnik wrote:
> Encryption is much easier to implement than compression, because it is not
> changing page size. So I do not see any "complexity and flexibility
> challenges" here.
> Just for reference I attached to this mail our own encryption patch. I do

I didn't see you using CPU AES instructions, which can improve
performance by 3-10x.  Is there a reason?

> Postgres buffer manager interface significantly simplifies integration of
> encryption and compression. There is actually single path through which data
> is fetched/stored to the disk.
> It is most obvious and natural solution to decompress/decrypt data when it
> is read from the disk to page pool and compress/encrypt it when it is
> written back. Taken in account that memory is cheap now and many databases
> can completely fit in memory, storing pages in the buffer cache in plain
> (decompressed/decrypted) format allows to minimize overhead of
> compression/encryption and its influence on performance. For read only
> queries working with cached data performance will be exactly the same as
> without encryption/compression.
> Write speed for encrypted pages will be certainly slightly worse, but still
> encryption speed is much higher than disk IO speed.

Good point.

> I do not think that pluggable storage API is right approach to integrate
> compression and especially encryption. It is better to plugin encryption
> between buffer manager and storage device,
> allowing to use  it with any storage implementation. Also it is not clear to
> me whether encryption of WAL can be provided using pluggable storage API.

Yes, you are completely correct.  I withdraw my suggestion of doing it
as plugin storage.

> The last discussed question is whether it is necessary to encrypt temporary
> data (BufFile). In our solution we encrypt only main fork of non-system
> relations and do no encrypt temporary relations. It may cause that some
> secrete data will be stored at this disk in non-encrypted format. But
> accessing this data is not trivial. You can not just copy/stole disk, open
> database and do "select * from SecreteTable": you will have to extract data
> from raw file yourself. So looks like it is better to allow user to make
> choice whether to encrypt temporary data or not.

If we go forward with in-db encryption, I think we are going to have to
have a discussion about what parts of PGDATA need to be encrypted,
i.e., I don't think pg_clog needs encryption.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WIP: Data at rest encryption

From
Tomas Vondra
Date:
Hi all,

I've noticed this thread got resurrected a few days ago, but I haven't 
managed to read all the messages until today. I do have a bunch of 
comments, but let me share them as a single consistent message instead 
of sending a thousand responses to individual messages.


1) Threat model
---------------

Firstly, I think the thread would seriously benefit from an explanation 
and discussion of the threat model - what types of attacks it's meant to 
address, and what attacks it can't defend against.

My understanding is that data-at-rest encryption generally addresses 
only the "someone stole the disk" case and pretty much nothing else.

Moreover, I claim that encryption implemented at the database-level is 
strictly weaker compared to LUKS or encrypted disks, because it simply 
reveals a lot more information even without decryption (file sizes, 
timestamps, etc.)

That is a serious issue in practice, and researches have been proving 
that for a long time now. I do recommend this paper from Cornell Tech as 
a great starting point (it cites many papers relevant to this thread):
    Why Your Encrypted Database Is Not Secure    Paul Grubbs, Thomas Ristenpart, Vitaly Schmatikov
http://eprint.iacr.org/2017/468.pdf

The paper explains how encryption schemes on general-purpose databases 
fail, due to exactly such side-channels. MVCC, logging and other side 
channels turn all attackers into "persistent passive attackers".

Now, this does not mean the feature is useless - nothing is perfect, and 
security is not a binary feature. It certainly makes attacks mode 
difficult compared to plaintext database. But it's untrue that it's 
basically LUKS, just implemented at the database level.

I'm not suggesting that we should not pursue this idea, but the threat 
model is a crucial piece of information missing in this thread.


2) How do other databases do it?
--------------------------------

It was repeatedly mentioned that other databases support this type of 
encryption. So how do they deal with the hard parts? For example how do 
they get and protect the encryption key?

I agree with Stephen that we should not require a full key management 
from v1 of the patch, that's an incredibly difficult thing. And it 
largely depends on the hardware (e.g. it should be possible to move the 
key to TrustZone on ARM / SGX on Intel).


3) Why do users prefer this to FDE?
-----------------------------------

I suppose we're discussing this feature because we've been asked about 
it by users/customers who can't use FDE. Can we reach to them and ask 
them about the reasons? Why can't they use FDE?

It was mentioned in the thread that the FDE solutions are not portable 
between different systems, but honestly - is that an issue? You probably 
can't copy the datadir anyway due locale differences anyway. If you're 
running multiple operating systems, FDE is just one of many differences.


4) Other solutions?
-------------------

Clearly, FDE (at the block device level) and DB-level encryption are not 
the only solutions. There are also filesystems-level solutions now, for 
example.

ext4 (since kernel 4.1) and f2fs (since kernel 4.2) allow encryption at 
directory level, are transparent to the user space, and include things 
like key management (well, that's provided by kernel). NTFS can do 
something quite similar using EFS.
    https://lwn.net/Articles/639427/
https://blog.quarkslab.com/a-glimpse-of-ext4-filesystem-level-encryption.html

Of course, if you happen to use another filesystem (e.g. XFS), this 
won't work for you. But then there's eCryptfs, for example:
    https://en.wikipedia.org/wiki/ECryptfs


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Thu, Jun 15, 2017 at 7:51 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I thought we called it "incremental development".  From the opposite
> point of view, would you say we should ban use of passphrase-protected
> SSL key files because the current user interface for them is bad?

I think that we've got a number of features which exist in the tree
today only because either (a) our standards were lower at the time
that those features were committed than they are today or (b) we
didn't realize how much trouble those features were going to create.
Just because we don't want to hose the people already using those
features does not mean that we want more features engineered to that
same quality level.  Obviously, there's room for debate in any
particular case about how reasonable it is to expect someone who wants
to implement A to also improve B, and, well, maybe handling thing as
we do SSL certificates is good enough for this feature, too.  I find
myself a bit skeptical about that, though.  It preclude as lot of
things we might want to do.  You're not going to be able to interface
with some external key management server that way, nor do encryption
of only part of the database, nor have multiple keys for different
parts of the database using that kind of setup.

One could argue that can all be added later, but I think there's a
question about how much that's going to affect the code structure.
Surely we don't want to install encryption v1 and then find that, by
not considering key management, we've made it really hard to get to
v2, and that it basically can't be done without ripping the whole
implementation out and replacing it.  Maybe the database needs, at
some rather low level, a concept of whether the encryption key (or an
encryption key) is available yet, and maybe you get out of considering
that by deciding you're just going to prompt very early in startup,
but now when someone comes along and wants to improve things later,
and they've got to try to go find all of the code that depends on the
assumption that the key is always available and fix it.  That could be
pretty annoying to validate.  I think it's better to give at least
some consideration to these key management questions from the
beginning, lest we back ourselves into a corner.  Whether or not the
SSL-passphrase style implementation is above or below the level we'd
consider a minimally viable feature, it's surely not where we want to
end up, and we shouldn't do anything that makes it likely that we'll
get stuck at precisely that point.

Also, practically, I think that type of solution is going to be
extremely difficult to use for most people.  It means that the
database can't be started by the system startup scripts; you'll have
to log into the PG OS user account and launch the database from there.
IIUC, that means it won't be able to be controlled by things like
systemd, that just know about start and stop, but not about ask for a
password in the middle.  Maybe I'm wrong about that, though.  And
certainly, there will be some users for whom starting the database
manually and prompting for a password will be good enough, if not
ideal.  But for people who want to fetch the encryption key from a key
management server, which I bet is a lot of people, that's not really
going to be good enough.  I'm not really sure that rushing a first
patch that "works" for sufficiently small values of "works" is
actually going

> I have no use for data-at-rest encryption myself, but I wouldn't stop
> development just because the initial design proposal doesn't include
> top-notch key management.


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Robert Haas
Date:
On Mon, Jun 19, 2017 at 12:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 15, 2017 at 7:51 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I thought we called it "incremental development".  From the opposite
>> point of view, would you say we should ban use of passphrase-protected
>> SSL key files because the current user interface for them is bad?
>
> I think that we've got a number of features which exist in the tree
> today only because either (a) our standards were lower at the time
> that those features were committed than they are today or (b) we
> didn't realize how much trouble those features were going to create.
> Just because we don't want to hose the people already using those
> features does not mean that we want more features engineered to that
> same quality level.  Obviously, there's room for debate in any
> particular case about how reasonable it is to expect someone who wants
> to implement A to also improve B, and, well, maybe handling thing as
> we do SSL certificates is good enough for this feature, too.  I find
> myself a bit skeptical about that, though.  It preclude as lot of
> things we might want to do.  You're not going to be able to interface
> with some external key management server that way, nor do encryption
> of only part of the database, nor have multiple keys for different
> parts of the database using that kind of setup.
>
> One could argue that can all be added later, but I think there's a
> question about how much that's going to affect the code structure.
> Surely we don't want to install encryption v1 and then find that, by
> not considering key management, we've made it really hard to get to
> v2, and that it basically can't be done without ripping the whole
> implementation out and replacing it.  Maybe the database needs, at
> some rather low level, a concept of whether the encryption key (or an
> encryption key) is available yet, and maybe you get out of considering
> that by deciding you're just going to prompt very early in startup,
> but now when someone comes along and wants to improve things later,
> and they've got to try to go find all of the code that depends on the
> assumption that the key is always available and fix it.  That could be
> pretty annoying to validate.  I think it's better to give at least
> some consideration to these key management questions from the
> beginning, lest we back ourselves into a corner.  Whether or not the
> SSL-passphrase style implementation is above or below the level we'd
> consider a minimally viable feature, it's surely not where we want to
> end up, and we shouldn't do anything that makes it likely that we'll
> get stuck at precisely that point.
>
> Also, practically, I think that type of solution is going to be
> extremely difficult to use for most people.  It means that the
> database can't be started by the system startup scripts; you'll have
> to log into the PG OS user account and launch the database from there.
> IIUC, that means it won't be able to be controlled by things like
> systemd, that just know about start and stop, but not about ask for a
> password in the middle.  Maybe I'm wrong about that, though.  And
> certainly, there will be some users for whom starting the database
> manually and prompting for a password will be good enough, if not
> ideal.  But for people who want to fetch the encryption key from a key
> management server, which I bet is a lot of people, that's not really
> going to be good enough.  I'm not really sure that rushing a first
> patch that "works" for sufficiently small values of "works" is
> actually going

...to move us forward very much.

>> I have no use for data-at-rest encryption myself, but I wouldn't stop
>> development just because the initial design proposal doesn't include
>> top-notch key management.

I agree with that, but there's a difference between "not top-notch"
and "pretty bad".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Data at rest encryption

From
Antonin Houska
Date:
Ants Aasma <ants.aasma@gmail.com> wrote:

> Attached to this mail is a work in progress patch that adds an
> extensible encryption mechanism. There are some loose ends left to tie
> up, but the general concept and architecture is at a point where it's
> ready for some feedback, fresh ideas and bikeshedding.

Rebased patch is attached here, in case it helps to achieve (some of) the
goals mentioned in the related thread [1].

Besides encrypting table and WAL pages, it encrypts the temporary files
(buffile.c), data stored during logical decoding (reorderbuffer.c) and
statistics temporary files (pgstat.c). Unlike the previous version, SLRU files
(e.g. CLOG) are not encrypted (it does not seem critical and the encryption
makes torn page write quite difficult to handle).

Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher
now.

Binary upgrade from unencrypted to encrypted cluster is not implemented yet.


[1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Attachment

Re: [HACKERS] WIP: Data at rest encryption

From
Toshi Harada
Date:
Hi.

I am interested in a patch of "WIP: Data at rest encryption".
This patch("data-at-rest-encryption-wip-2018.06.27.patch") is applied to PostgreSQL 11-beta 2 and it is running.

In the explanation of this patch, since "data stored during logical decoding" is written, 
we tried logical decoding by the test_decoding module, but the following error occurs when creating a slot.


pgbench_db=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
ERROR:  invalid magic number B419 in log segment 000000020000000000000010, offset 0
pgbench_db=#


(Also, if you run "CREATE SUNSCRIPTION" for logical replication from another server, a similar error will occur.)

Question.
In "data-at-rest-encryption-wip-2018.06.27.patch", is logical decoding still not implemented?
Or is there a need for another logical decoding plug-in for "WIP: Data at rest encryption"?

Regards.

Antonin Houska <ah@cybertec.at> wrote:
> Ants Aasma <ants.aasma@gmail.com> wrote:
> 
> > Attached to this mail is a work in progress patch that adds an
> > extensible encryption mechanism. There are some loose ends left to tie
> > up, but the general concept and architecture is at a point where it's
> > ready for some feedback, fresh ideas and bikeshedding.
> 
> Rebased patch is attached here, in case it helps to achieve (some of) the
> goals mentioned in the related thread [1].
> 
> Besides encrypting table and WAL pages, it encrypts the temporary files
> (buffile.c), data stored during logical decoding (reorderbuffer.c) and
> statistics temporary files (pgstat.c). Unlike the previous version, SLRU files
> (e.g. CLOG) are not encrypted (it does not seem critical and the encryption
> makes torn page write quite difficult to handle).
> 
> Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher
> now.
> 
> Binary upgrade from unencrypted to encrypted cluster is not implemented yet.
> 
> 
> [1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
> 




"WIP: Data at rest encryption" patch and, 2 phase commit.

From
Toshi Harada
Date:
Hi.

I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it.

When this patch is applied, the following problem occurs.

* An error occurs when CHECKPOINT is executed during two-phase commit.
* After an error occurs, if you stop PostgreSQL, it will never start again.

(1) First, execute PREPARE TRANSACTION.

postgres=# BEGIN;
BEGIN
postgres=# PREPARE TRANSACTION 'foo';
PREPARE TRANSACTION
postgres=#

(2) Execute the CHECKPOINT command from another terminal.
    CHEKPOINT command fails.

postgres=# CHECKPOINT;
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
postgres=#


(3) ROLLBACK PREPARED command also fails.

postgres=# ROLLBACK PREPARED 'foo';
ERROR:  could not read two-phase state from WAL at 0/167EBA0
postgres=# 


(4) Shut down the PostgreSQL server.
    During shutdown, a "could not read two-phase state from WAL" error occurs.


2018-07-23 14:49:08.924 JST [15821] LOG:  received fast shutdown request
2018-07-23 14:49:08.925 JST [15821] LOG:  aborting any active transactions
2018-07-23 14:49:08.925 JST [15831] FATAL:  terminating connection due to administrator command
2018-07-23 14:49:08.928 JST [15821] LOG:  background worker "logical replication launcher" (PID 15829) exited with exit
code1
 
2018-07-23 14:49:08.928 JST [15824] LOG:  shutting down
2018-07-23 14:49:08.935 JST [15824] FATAL:  could not read two-phase state from WAL at 0/167EBA0
2018-07-23 14:49:08.936 JST [15821] LOG:  checkpointer process (PID 15824) exited with exit code 1
2018-07-23 14:49:08.936 JST [15821] LOG:  terminating any other active server processes
2018-07-23 14:49:08.937 JST [15821] LOG:  abnormal database system shutdown
2018-07-23 14:49:08.945 JST [15821] LOG:  database system is shut down


(5) When restarting the PostgreSQL server, an error(could not read two-phase state from WAL) occurs 
    and the PostgreSQL server can not be started.

2018-07-23 14:49:42.489 JST [15864] LOG:  listening on IPv6 address "::1", port 5432
2018-07-23 14:49:42.489 JST [15864] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2018-07-23 14:49:42.492 JST [15864] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-07-23 14:49:42.521 JST [15866] LOG:  database system shutdown was interrupted; last known up at 2018-07-23
14:49:08JST
 
2018-07-23 14:49:42.674 JST [15866] LOG:  database system was not properly shut down; automatic recovery in progress
2018-07-23 14:49:42.676 JST [15866] LOG:  redo starts at 0/167EB60
2018-07-23 14:49:42.676 JST [15866] LOG:  invalid record length at 0/167EC70: wanted 24, got 0
2018-07-23 14:49:42.676 JST [15866] LOG:  redo done at 0/167EC30
2018-07-23 14:49:42.677 JST [15866] FATAL:  could not read two-phase state from WAL at 0/167EBA0
2018-07-23 14:49:42.678 JST [15864] LOG:  startup process (PID 15866) exited with exit code 1
2018-07-23 14:49:42.678 JST [15864] LOG:  aborting startup due to startup process failure
2018-07-23 14:49:42.682 JST [15864] LOG:  database system is shut down

Regards.

----
Harada Toshi.
NTT TechnoCross Corporation

Antonin Houska <ah@cybertec.at> wrote:
> Ants Aasma <ants.aasma@gmail.com> wrote:
> 
> > Attached to this mail is a work in progress patch that adds an
> > extensible encryption mechanism. There are some loose ends left to tie
> > up, but the general concept and architecture is at a point where it's
> > ready for some feedback, fresh ideas and bikeshedding.
> 
> Rebased patch is attached here, in case it helps to achieve (some of) the
> goals mentioned in the related thread [1].
> 
> Besides encrypting table and WAL pages, it encrypts the temporary files
> (buffile.c), data stored during logical decoding (reorderbuffer.c) and
> statistics temporary files (pgstat.c). Unlike the previous version, SLRU files
> (e.g. CLOG) are not encrypted (it does not seem critical and the encryption
> makes torn page write quite difficult to handle).
> 
> Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher
> now.
> 
> Binary upgrade from unencrypted to encrypted cluster is not implemented yet.
> 
> 
> [1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
> 




Re: [HACKERS] WIP: Data at rest encryption

From
Antonin Houska
Date:
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:

> I am interested in a patch of "WIP: Data at rest encryption".
> This patch("data-at-rest-encryption-wip-2018.06.27.patch") is applied to PostgreSQL 11-beta 2 and it is running.
>
> In the explanation of this patch, since "data stored during logical decoding" is written,
> we tried logical decoding by the test_decoding module, but the following error occurs when creating a slot.
>
>
> pgbench_db=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
> ERROR:  invalid magic number B419 in log segment 000000020000000000000010, offset 0
> pgbench_db=#

I could not reproduce this error ...

> (Also, if you run "CREATE SUNSCRIPTION" for logical replication from another server, a similar error will occur.)

... but I succeeded for this. The problem was that a separate XLOG reader
callback is used for logical replication and that did not decrypt the XLOG
page. Fixed in the new version of the patch. Do you still see
pg_create_logical_replication_slot() failing?

> Question.
> In "data-at-rest-encryption-wip-2018.06.27.patch", is logical decoding still not implemented?
> Or is there a need for another logical decoding plug-in for "WIP: Data at rest encryption"?

No, there's nothing special about logical decoding.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Attachment

Re: "WIP: Data at rest encryption" patch and, 2 phase commit.

From
Antonin Houska
Date:
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:

> Hi.
>
> I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it.
>
> When this patch is applied, the following problem occurs.
>
> * An error occurs when CHECKPOINT is executed during two-phase commit.
> * After an error occurs, if you stop PostgreSQL, it will never start again.
>
> (1) First, execute PREPARE TRANSACTION.
>
> postgres=# BEGIN;
> BEGIN
> postgres=# PREPARE TRANSACTION 'foo';
> PREPARE TRANSACTION
> postgres=#
>
> (2) Execute the CHECKPOINT command from another terminal.
>     CHEKPOINT command fails.
>
> postgres=# CHECKPOINT;
> ERROR:  checkpoint request failed
> HINT:  Consult recent messages in the server log for details.
> postgres=#

The patch version I posted in

https://www.postgresql.org/message-id/11678.1532519255%40localhost

fixes an issue (unitialized pointer) that caused failure here, but it was
SEGFAULT rather than ERROR. And the scope of the bug was broader than just
CHECKPOINT.

Can you please test it again with the new version of the patch?

Anyway, thanks for your reports!


--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: "WIP: Data at rest encryption" patch and, 2 phase commit.

From
Toshi Harada
Date:
Hi.

Applying https://www.postgresql.org/message-id/11678.1532519255%40localhost patch, 
the problem of pg_create_logical_replication_slot () and the 2PC problem were solved.

Thanks.

Antonin Houska <ah@cybertec.at> wrote:
> Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:
> 
> > Hi.
> > 
> > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it.
> > 
> > When this patch is applied, the following problem occurs.
> > 
> > * An error occurs when CHECKPOINT is executed during two-phase commit.
> > * After an error occurs, if you stop PostgreSQL, it will never start again.
> > 
> > (1) First, execute PREPARE TRANSACTION.
> > 
> > postgres=# BEGIN;
> > BEGIN
> > postgres=# PREPARE TRANSACTION 'foo';
> > PREPARE TRANSACTION
> > postgres=#
> > 
> > (2) Execute the CHECKPOINT command from another terminal.
> >     CHEKPOINT command fails.
> > 
> > postgres=# CHECKPOINT;
> > ERROR:  checkpoint request failed
> > HINT:  Consult recent messages in the server log for details.
> > postgres=#
> 
> The patch version I posted in
> 
> https://www.postgresql.org/message-id/11678.1532519255%40localhost
> 
> fixes an issue (unitialized pointer) that caused failure here, but it was
> SEGFAULT rather than ERROR. And the scope of the bug was broader than just
> CHECKPOINT.
> 
> Can you please test it again with the new version of the patch?
> 
> Anyway, thanks for your reports!
> 
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
> 




"WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Toshi Harada
Date:
Hi.

If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week, 
patch execution will fail as follows.

----
patching file src/backend/replication/logical/reorderbuffer.c
Hunk #9 FAILED at 2464.

1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej

1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej
----


(See the attached file for the entire patch log)

Antonin Houska <ah@cybertec.at> wrote:
> Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:
> 
> > Hi.
> > 
> > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it.
> > 
> > When this patch is applied, the following problem occurs.
> > 
> > * An error occurs when CHECKPOINT is executed during two-phase commit.
> > * After an error occurs, if you stop PostgreSQL, it will never start again.
> > 
> > (1) First, execute PREPARE TRANSACTION.
> > 
> > postgres=# BEGIN;
> > BEGIN
> > postgres=# PREPARE TRANSACTION 'foo';
> > PREPARE TRANSACTION
> > postgres=#
> > 
> > (2) Execute the CHECKPOINT command from another terminal.
> >     CHEKPOINT command fails.
> > 
> > postgres=# CHECKPOINT;
> > ERROR:  checkpoint request failed
> > HINT:  Consult recent messages in the server log for details.
> > postgres=#
> 
> The patch version I posted in
> 
> https://www.postgresql.org/message-id/11678.1532519255%40localhost
> 
> fixes an issue (unitialized pointer) that caused failure here, but it was
> SEGFAULT rather than ERROR. And the scope of the bug was broader than just
> CHECKPOINT.
> 
> Can you please test it again with the new version of the patch?
> 
> Anyway, thanks for your reports!
> 
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
>

Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:

> Hi.
>
> If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week,
> patch execution will fail as follows.

> ----
> patching file src/backend/replication/logical/reorderbuffer.c
> Hunk #9 FAILED at 2464.
>
> 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej
>
> 1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej

The patch should be applicable to the master branch.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Toshi Harada
Date:
Hi.

Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch, 
the same patch error will occur.
(Patch error of pg_rewind was eliminated)

----
patching file src/backend/replication/logical/reorderbuffer.c
Hunk #9 FAILED at 2464.

patching file src/bin/pg_upgrade/controldata.c
Hunk #1 FAILED at 58.
----
(See the attached file for the entire patch log)

Antonin Houska <ah@cybertec.at> wrote:
> Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:
> 
> > Hi.
> > 
> > If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week, 
> > patch execution will fail as follows.
> 
> > ----
> > patching file src/backend/replication/logical/reorderbuffer.c
> > Hunk #9 FAILED at 2464.
> > 
> > 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej
> > 
> > 1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej
> 
> The patch should be applicable to the master branch.
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
>

Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:

> Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch,
> the same patch error will occur.

The version attached to this email should be applicable to the current
branch. Sorry for the delay.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Fri, Sep 14, 2018 at 10:04 AM Antonin Houska <ah@cybertec.at> wrote:
> Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote:
> > Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch,
> > the same patch error will occur.
>
> The version attached to this email should be applicable to the current
> branch. Sorry for the delay.

I have a few comments on this patch.  It doesn't seem useful to go
through and make really detailed review comments on everything at this
stage, because it's pretty far from being committable, at least IMHO.
However, I have a few general thoughts.

First of all, this is an impressive piece of work.  It's obvious even
from looking at this quickly that a lot of energy has gone into making
this work, and it touches a lot of stuff, not without reason.  For
example, it's quite impressive that this touches things like the
write-ahead log, logical decoding, and the stats collector, all of
which write to disk.

However, this is also quite invasive.  It changes a lot of code and it
doesn't do so in a very predictable way.  It's not like you went
through and replaced every call to write() with a call to
SpecialEncyptionMagicWrite().  Rather, there are new #ifdef
USE_ENCRYPTION blocks all over the code base.  I think it will be
important to look for ways to refactor this functionality to reduce
that sort of thing to the absolute minimum possible.  Encryption needs
to be something that the code needs to know about here and there, but
not everywhere.

Some of the changes are things that could perhaps be done as separate,
preparatory patches.  For instance, pgstat.c gets a heavy refactoring
to make it do I/O in bigger chunks.  There's no reason that you
couldn't separate some of those changes out, clean them up, and get
them committed separately.  It would be more efficient even as things
stand.  OK, well, there is one reason: we may be getting a
shared-memory stats collector soon, and if we do, then the need for
all of this will go away.  But the general point is valid: whatever is
a useful cleanup apart from this patch should be done separately from
this patch.

As far as possible, we need to try to do things in the same way in the
encrypted and non-encrypted cases.  For example, it's pretty hard to
feel good about code like this:

+ sz_hdr = sizeof(ReorderBufferDiskChange);
+ if (data_encrypted)
+#ifdef USE_ENCRYPTION
+ sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr);
+#else
+ ENCRYPTION_NOT_SUPPORTED_MSG;
+#endif /* USE_ENCRYPTION */

You won't be able to have much confidence that this code works both
with and without encryption unless you test it both ways, because the
file format is different in those two cases, and not just by being
encrypted.  That means that anyone who modifies this code in the
future has two ways of breaking it.  They can break the normal case,
or they can break the encrypted case.  If encryption were available as
a sort of service that everyone could use, then that would probably be
fine, but like I said above, I think something as invasive as this
currently is will lead to a lot of complaints.

The code needs a lot more documentation, not only SGML documentation
but also code comments and probably a README file someplace.  There is
a lot of discussion in the comments here of encryption tweaks, but
there's no general introduction to the topic (what's a tweak?) or --
and I think this is important -- what the idea between the choice of
various tweaks in different places actually is.  There's probably some
motivating philosophy behind the way the tweaks are chosen that should
be explained someplace.  Think about it this way: if some hapless
developer, let's say me, wants to add a new module to PostgreSQL which
happens to write data to disk, that person needs an easy way to
understand what they need to do to make that new code play nice with
encryption.  Right now, the code for every existing module is a little
different (uggh) and there's this encryption tweak stuff that you
about which you have to do something intelligent, but there's nothing
that tells you how to be intelligent about it, at least not that I
see.  People are going to need something that looks more like a
formula that they can just copy, and a good introduction to the
principles in some appropriate place, too.

The interaction of this capability with certain tricks that PostgreSQL
plays needs some thought -- and documentation, at least developer
documentation, maybe user documentation.  One area is recovery.
Writing WAL relies on the fact that if you are in the midst of
rewriting a block and the power fails, bytes that are the same in both
the old and new block images will be undisturbed; encryption will make
that false.  How's that going to work?  Hint bits also rely on this
principle.  I thought there might be some interaction between this
work and wal_log_hints for this reason, but I see nothing of that sort
in the patch.  Full page writes seem related too; I don't know how
something like this can be safe without both full_page_writes and
wal_log_hints forced on.  I don't know whether using encryption should
forcibly override those settings or whether it should just refuse to
start unless they are set properly, but I think it probably needs to
do one or the other.

Hope this helps in some way.  I think it would be good if this effort
went forward in some way.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Thanks for feedback!

Robert Haas <robertmhaas@gmail.com> wrote:

> However, this is also quite invasive.  It changes a lot of code and it
> doesn't do so in a very predictable way.  It's not like you went
> through and replaced every call to write() with a call to
> SpecialEncyptionMagicWrite().  Rather, there are new #ifdef
> USE_ENCRYPTION blocks all over the code base.  I think it will be
> important to look for ways to refactor this functionality to reduce
> that sort of thing to the absolute minimum possible.  Encryption needs
> to be something that the code needs to know about here and there, but
> not everywhere.

The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code
needs to compile even w/o OpenSSL (of course the encryption won't be enabled
in that case). I'll try to reduce the use of this construct only to the code
blocks that really reference the OpenSSL functions.

> Some of the changes are things that could perhaps be done as separate,
> preparatory patches.  For instance, pgstat.c gets a heavy refactoring
> to make it do I/O in bigger chunks.  There's no reason that you
> couldn't separate some of those changes out, clean them up, and get
> them committed separately.  It would be more efficient even as things
> stand.  OK, well, there is one reason: we may be getting a
> shared-memory stats collector soon, and if we do, then the need for
> all of this will go away.  But the general point is valid: whatever is
> a useful cleanup apart from this patch should be done separately from
> this patch.

We'll omit the stats collector related changes so far because the patch [1] is
under active development, and that would require us to rebase our patch
often. And if the stats files will eventually go away, we won't need to
encrypt the statistics.

> As far as possible, we need to try to do things in the same way in the
> encrypted and non-encrypted cases.  For example, it's pretty hard to
> feel good about code like this:
> 
> + sz_hdr = sizeof(ReorderBufferDiskChange);
> + if (data_encrypted)
> +#ifdef USE_ENCRYPTION
> + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr);
> +#else
> + ENCRYPTION_NOT_SUPPORTED_MSG;
> +#endif /* USE_ENCRYPTION */
> 
> You won't be able to have much confidence that this code works both
> with and without encryption unless you test it both ways, because the
> file format is different in those two cases, and not just by being
> encrypted.  That means that anyone who modifies this code in the
> future has two ways of breaking it.  They can break the normal case,
> or they can break the encrypted case.  If encryption were available as
> a sort of service that everyone could use, then that would probably be
> fine, but like I said above, I think something as invasive as this
> currently is will lead to a lot of complaints.

The problem I tried to solve here is that the whole encryption block (16
bytes) needs to be read and decrypted even if you need only part of its
data. In the snippet above I tried to ensure that the whole blocks are always
read, but I agree this approach is fragile.

Since buffile.c needs to handle this kind of problem (and it already does in
the last patch version), I think even other than temporary files could be
handled by this module. The patch below adds functions BufFileOpenTransient(),
BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(),
etc. respectively. Once we implement the encryption, these functions will also
hide handling of the encryption blocks from user. In this (preliminary) patch
As an example, I applied this approach to ReorderBufferSerializeChange() and
the function seems a bit simpler now. (ReorderBufferRestoreChange() would need
more work to adopt this approach.)

> The code needs a lot more documentation, not only SGML documentation
> but also code comments and probably a README file someplace.

ok, we'll do that.

> The interaction of this capability with certain tricks that PostgreSQL
> plays needs some thought -- and documentation, at least developer
> documentation, maybe user documentation.  One area is recovery.
> Writing WAL relies on the fact that if you are in the midst of
> rewriting a block and the power fails, bytes that are the same in both
> the old and new block images will be undisturbed; encryption will make
> that false.  How's that going to work?

Yes, if only a single bit is different, decryption will turn the whole
encryption block (16 bytes) into garbage. So we need to enforce
full_page_writes=on if encryption is enabled. The last version of the patch
does not do that.

> Hint bits also rely on this principle.  I thought there might be some
> interaction between this work and wal_log_hints for this reason, but I see
> nothing of that sort in the patch.

I'm not sure if the hint bit is still a problem if we first copy the shared
buffer to backend-local memory and encrypt it there. That's what the patch
does.

> Full page writes seem related too; I don't know how something like this can
> be safe without both full_page_writes and wal_log_hints forced on.  I don't
> know whether using encryption should forcibly override those settings or
> whether it should just refuse to start unless they are set properly, but I
> think it probably needs to do one or the other.

Maybe it's more polite to refuse to start so that user knows what's going
on. I'm not sure if PG ever changes any configuration variable forcibly.

> Hope this helps in some way.  I think it would be good if this effort
> went forward in some way.

Sure, it helps! Thanks.


[1] https://commitfest.postgresql.org/22/1708/

-- 
Antonin Houska
https://www.cybertec-postgresql.com


Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <ah@cybertec.at> wrote:
> The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code
> needs to compile even w/o OpenSSL (of course the encryption won't be enabled
> in that case). I'll try to reduce the use of this construct only to the code
> blocks that really reference the OpenSSL functions.

That sounds like a good place to start.  The other part will be
avoiding referencing the OpenSSL functions from too many places.  For
example, right now we have secure_read() -- which I know some people
hate, or at least hate the name of] -- which lets you send data to the
client without having to care whether that data is going to be
SSL-encrypted on its way over the network.  One can argue about
whether that particular abstraction is properly-designed, but as Tom
recently felt the urge to remind me on another thread, the value of
abstraction in general is not in doubt.  I think we need one here, so
that the SSL-specific bits can be isolated in a relatively small
number of files, and then clients can ask to write a file, or write a
block to a file, or whatever, and ignore the question of whether that
is going to get encrypted somehow behind the scenes.

> Since buffile.c needs to handle this kind of problem (and it already does in
> the last patch version), I think even other than temporary files could be
> handled by this module. The patch below adds functions BufFileOpenTransient(),
> BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(),
> etc. respectively. Once we implement the encryption, these functions will also
> hide handling of the encryption blocks from user. In this (preliminary) patch
> As an example, I applied this approach to ReorderBufferSerializeChange() and
> the function seems a bit simpler now. (ReorderBufferRestoreChange() would need
> more work to adopt this approach.)

Yeah, I don't have time to look at this in detail right now, but this
kind of thing seems like a promising approach to me.

> > The interaction of this capability with certain tricks that PostgreSQL
> > plays needs some thought -- and documentation, at least developer
> > documentation, maybe user documentation.  One area is recovery.
> > Writing WAL relies on the fact that if you are in the midst of
> > rewriting a block and the power fails, bytes that are the same in both
> > the old and new block images will be undisturbed; encryption will make
> > that false.  How's that going to work?
>
> Yes, if only a single bit is different, decryption will turn the whole
> encryption block (16 bytes) into garbage. So we need to enforce
> full_page_writes=on if encryption is enabled. The last version of the patch
> does not do that.

I believe that we need full_page_writes=on and wal_log_hints=on, but
they don't fix the problem that I'm describing.  Suppose that the very
last page in the write-ahead log contains 3.5kB of good data and 7kB
of zeroes.  Now, somebody comes along and writes a new write-ahead log
record that is 1k in size.  In memory, we update the block: it now
contains 4.5kB of good data and 6.5kB of zeroes.  Now, as we're
writing the block to do disk, the power is removed.  Consequently, the
first 4kB hits the platter and the other 4kB disappears into the void.

If the WAL is not encrypted, the state at this point is that the 3.5kB
of good data that we had previously is still there, and the first 500
bytes of the new WAL record are also there, and the rest is missing.
Recovery will detect that there's a valid-looking record at end of
WAL, but when it tries to checksum that record, it will fail, and so
that trailing half-record will just get ignored.  And that's fine.

If the WAL *is* encrypted, the state at this point is that the block
is unreadable, because the first 4kB of the block is the first half of
the bits that resulted from encrypting 8kB of data that includes the
new record, and the second 4kB of the block is the second half of the
bits that resulted from encrypting 8kB of data that did not include
the new record, and since the new record perturbed every bit on the
page with probability ~50%, what that means is you now have garbage.
That means that not only did we lose the new record, but we also lost
the 3.5kB of good data which the page previously contained.  That's
NOT ok.  Some of the changes associated with those WAL records may
have been flushed to disk already, or there may be commits in there
that were acknowledged to the client, and we can't just lose them.

One idea I have for fixing this problem is to encrypt the individual
WAL records rather than the blocks.  That leaks some information, of
course, but maybe not in a way that really matters.

> > Hint bits also rely on this principle.  I thought there might be some
> > interaction between this work and wal_log_hints for this reason, but I see
> > nothing of that sort in the patch.
>
> I'm not sure if the hint bit is still a problem if we first copy the shared
> buffer to backend-local memory and encrypt it there. That's what the patch
> does.

That has the same problem that I described in the previous paragraph,
except for the relation data files rather than the WAL itself.  See
the manual's description of wal_log_hints:

       <para>
        When this parameter is <literal>on</literal>, the
<productname>PostgreSQL</productname>
        server writes the entire content of each disk page to WAL during the
        first modification of that page after a checkpoint, even for
        non-critical modifications of so-called hint bits.
       </para>

For encryption to work, you need that.  A hint bit write might convert
the page to garbage, just as in the previous example, and this option
make sure that if that happens, there will be an FPW, allowing you to
recover... unless of course you also hit the problem I described
above, but assuming that's fixed somehow you still have this issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If the WAL *is* encrypted, the state at this point is that the block
> is unreadable, because the first 4kB of the block is the first half of
> the bits that resulted from encrypting 8kB of data that includes the
> new record, and the second 4kB of the block is the second half of the
> bits that resulted from encrypting 8kB of data that did not include
> the new record, and since the new record perturbed every bit on the
> page with probability ~50%, what that means is you now have garbage.
> That means that not only did we lose the new record, but we also lost
> the 3.5kB of good data which the page previously contained.  That's
> NOT ok.  Some of the changes associated with those WAL records may
> have been flushed to disk already, or there may be commits in there
> that were acknowledged to the client, and we can't just lose them.

ISTM that this is only a problem if you choose the wrong encryption
method.  One not-wrong encryption method is to use a stream cipher
--- maybe that's not the exact right technical term, but anyway
I'm talking about a method which notionally XOR's the cleartext
data with a random bit stream generated from the encryption key
(probably along with other knowable inputs such as the block number).
In such a method, corruption of individual on-disk bytes doesn't
prevent you from getting the correct decryption of on-disk bytes
that aren't corrupted.

            regards, tom lane


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Fri, Mar 15, 2019 at 5:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ISTM that this is only a problem if you choose the wrong encryption
> method.  One not-wrong encryption method is to use a stream cipher
> --- maybe that's not the exact right technical term, but anyway
> I'm talking about a method which notionally XOR's the cleartext
> data with a random bit stream generated from the encryption key
> (probably along with other knowable inputs such as the block number).
> In such a method, corruption of individual on-disk bytes doesn't
> prevent you from getting the correct decryption of on-disk bytes
> that aren't corrupted.

Oh, that seems like it might be a really good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Haas <robertmhaas@gmail.com> writes:
> > If the WAL *is* encrypted, the state at this point is that the block
> > is unreadable, because the first 4kB of the block is the first half of
> > the bits that resulted from encrypting 8kB of data that includes the
> > new record, and the second 4kB of the block is the second half of the
> > bits that resulted from encrypting 8kB of data that did not include
> > the new record, and since the new record perturbed every bit on the
> > page with probability ~50%, what that means is you now have garbage.
> > That means that not only did we lose the new record, but we also lost
> > the 3.5kB of good data which the page previously contained.  That's
> > NOT ok.  Some of the changes associated with those WAL records may
> > have been flushed to disk already, or there may be commits in there
> > that were acknowledged to the client, and we can't just lose them.
> 
> ISTM that this is only a problem if you choose the wrong encryption
> method.  One not-wrong encryption method is to use a stream cipher
> --- maybe that's not the exact right technical term, but anyway
> I'm talking about a method which notionally XOR's the cleartext
> data with a random bit stream generated from the encryption key
> (probably along with other knowable inputs such as the block number).
> In such a method, corruption of individual on-disk bytes doesn't
> prevent you from getting the correct decryption of on-disk bytes
> that aren't corrupted.

We actually use a block cipher (with block size 16 bytes), as opposed to
stream cipher. It's true that partial write is a problem because if a single
bit of the cipher text changed, decryption will produce 16 bytes of
garbage. However I'm not sure if partial write can affect as small unit as 16
bytes.

Nevertheless, with the current version of our patch, PG should be resistant
against such a partial write anyway because we chose to align XLOG records to
16 bytes (as long as the encryption is enabled) for the following reasons:

If one XLOG record ends and the following one starts in the same encryption
block, both records can get corrupted during streaming replication. The
scenario looks like: 1) the first record is written on master (the unused part
of the block contains zeroes), 2) the block is encrypted and its initial part
(i.e. the number of bytes occupied by the first record in the plain text) is
streamed to slave, 3) the second record is written on master, 4) the
containing encryption block is encrypted again and the trailing part (i.e. the
number of bytes occupied by the second record) is streamed, 5) decryption of
the block on slave will produce garbage and thus corrupt both records. This is
because the trailing part of the block was filled with zeroes during
encryption, but it contains different data at decryption time.

Alternative approach to this replication problem is that walsender decrypts
the stream and walreceiver encrypts it again. While this can provide us with
the advantage to have master and slave encrypted with different keys, this
approach brings some additional complexity. For example, pg_basebackup would
need to deal with encryption.

This design decision can be changed, but there's one more thing to consider:
if the XLOG stream is decrypted, the decryption cannot be disabled unless the
XLOG records are aligned to 16 bytes (and in turn, the XLOG alignment cannot
be enabled w/o initdb).

-- 
Antonin Houska
https://www.cybertec-postgresql.com


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Thu, Mar 21, 2019 at 7:46 AM Antonin Houska <ah@cybertec.at> wrote:
> Nevertheless, with the current version of our patch, PG should be resistant
> against such a partial write anyway because we chose to align XLOG records to
> 16 bytes (as long as the encryption is enabled) for the following reasons:
>
> If one XLOG record ends and the following one starts in the same encryption
> block, both records can get corrupted during streaming replication. The
> scenario looks like: 1) the first record is written on master (the unused part
> of the block contains zeroes), 2) the block is encrypted and its initial part
> (i.e. the number of bytes occupied by the first record in the plain text) is
> streamed to slave, 3) the second record is written on master, 4) the
> containing encryption block is encrypted again and the trailing part (i.e. the
> number of bytes occupied by the second record) is streamed, 5) decryption of
> the block on slave will produce garbage and thus corrupt both records. This is
> because the trailing part of the block was filled with zeroes during
> encryption, but it contains different data at decryption time.

Wouldn't Tom's proposal to use a stream cipher fix all this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Hint bits also rely on this principle.  I thought there might be some
> > > interaction between this work and wal_log_hints for this reason, but I see
> > > nothing of that sort in the patch.
> >
> > I'm not sure if the hint bit is still a problem if we first copy the shared
> > buffer to backend-local memory and encrypt it there. That's what the patch
> > does.
>
> That has the same problem that I described in the previous paragraph,
> except for the relation data files rather than the WAL itself.  See
> the manual's description of wal_log_hints:
>
>        <para>
>         When this parameter is <literal>on</literal>, the
> <productname>PostgreSQL</productname>
>         server writes the entire content of each disk page to WAL during the
>         first modification of that page after a checkpoint, even for
>         non-critical modifications of so-called hint bits.
>        </para>
>
> For encryption to work, you need that.  A hint bit write might convert
> the page to garbage, just as in the previous example, and this option
> make sure that if that happens, there will be an FPW, allowing you to
> recover...

I think I finally understand. Originally I thought the question is how to
compute correct page checksum while the hint bits can be changed w/o exclusive
lock on the buffer. Now I realize that it's more about *recovery*: if the hint
bit change is followed by a torn page write, the hint bit can get changed on
disk but the checksum might not get updated. The wrong checksum is detected
during recovery, but if XLOG does not contain the corresponding full page
image, we're not able to recover.

And with encryption, the consequence is even worse because torn page write
causes not only wrong checksum of otherwise useful page, but really damaged
page.

I'll enforce the FPW in the next version of the patch.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska <ah@cybertec.at> wrote:
> I think I finally understand. Originally I thought the question is how to
> compute correct page checksum while the hint bits can be changed w/o exclusive
> lock on the buffer. Now I realize that it's more about *recovery*: if the hint
> bit change is followed by a torn page write, the hint bit can get changed on
> disk but the checksum might not get updated. The wrong checksum is detected
> during recovery, but if XLOG does not contain the corresponding full page
> image, we're not able to recover.
>
> And with encryption, the consequence is even worse because torn page write
> causes not only wrong checksum of otherwise useful page, but really damaged
> page.

Correct.

> I'll enforce the FPW in the next version of the patch.

Cool.

I'm willing to put some effort into trying to get this into v13 if
you're willing to keep hacking on it, but there's probably a fair
amount to do and a year can go by in a hurry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> I'm willing to put some effort into trying to get this into v13 if
> you're willing to keep hacking on it, but there's probably a fair
> amount to do and a year can go by in a hurry.

That'd be appreciated, especially by my boss. It doesn't seem likely that in
this situation I'll be assigned many other tasks of higher priority. So yes, I
expect to have quite some time for this patch. Thanks.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
PostgreSQL - Hans-Jürgen Schönig
Date:
On 4/4/19 5:52 PM, Antonin Houska wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I'm willing to put some effort into trying to get this into v13 if
>> you're willing to keep hacking on it, but there's probably a fair
>> amount to do and a year can go by in a hurry.
> That'd be appreciated, especially by my boss. It doesn't seem likely that in
> this situation I'll be assigned many other tasks of higher priority. So yes, I
> expect to have quite some time for this patch. Thanks.
>

if there is any chance of having this committed into core, we will of 
course provide the capacity to hack on this.
the same of course applies to the optimizer patches tony has been 
working on.

     many thanks

         hans


-- 
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Thu, Apr 4, 2019 at 11:52 AM Antonin Houska <ah@cybertec.at> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> > I'm willing to put some effort into trying to get this into v13 if
> > you're willing to keep hacking on it, but there's probably a fair
> > amount to do and a year can go by in a hurry.
>
> That'd be appreciated, especially by my boss. It doesn't seem likely that in
> this situation I'll be assigned many other tasks of higher priority. So yes, I
> expect to have quite some time for this patch. Thanks.

Great.  I think it will be appreciated by my boss, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Mar 21, 2019 at 7:46 AM Antonin Houska <ah@cybertec.at> wrote:
> > Nevertheless, with the current version of our patch, PG should be resistant
> > against such a partial write anyway because we chose to align XLOG records to
> > 16 bytes (as long as the encryption is enabled) for the following reasons:
> >
> > If one XLOG record ends and the following one starts in the same encryption
> > block, both records can get corrupted during streaming replication. The
> > scenario looks like: 1) the first record is written on master (the unused part
> > of the block contains zeroes), 2) the block is encrypted and its initial part
> > (i.e. the number of bytes occupied by the first record in the plain text) is
> > streamed to slave, 3) the second record is written on master, 4) the
> > containing encryption block is encrypted again and the trailing part (i.e. the
> > number of bytes occupied by the second record) is streamed, 5) decryption of
> > the block on slave will produce garbage and thus corrupt both records. This is
> > because the trailing part of the block was filled with zeroes during
> > encryption, but it contains different data at decryption time.
>
> Wouldn't Tom's proposal to use a stream cipher fix all this?

Yes it would make the extra alignment unnecessary, but our solution tries to
meet specific requirements of disk encryption. Stream cipher appears to be
incompatible with these requirements:

https://en.wikipedia.org/wiki/Disk_encryption_theory

Currently we try to use the CBC-ESSIV mode. It's worth to mention that in this
mode, if the page is encrypted twice and if the plain data in the encryption
block N (i.e. block of 16 bytes) changes before the 2nd encryption, the
encrypted data will only change in blocks >= N. So the problem of losing
already flushed WAL records should not happen.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote:
> > Wouldn't Tom's proposal to use a stream cipher fix all this?
>
> Yes it would make the extra alignment unnecessary, but our solution tries to
> meet specific requirements of disk encryption. Stream cipher appears to be
> incompatible with these requirements:
>
> https://en.wikipedia.org/wiki/Disk_encryption_theory

Hmm.  Well, I don't know what to do about that, but I think this patch
is going to be facing an uphill battle if the encrypted and
unencrypted WAL formats use different alignment.

> Currently we try to use the CBC-ESSIV mode. It's worth to mention that in this
> mode, if the page is encrypted twice and if the plain data in the encryption
> block N (i.e. block of 16 bytes) changes before the 2nd encryption, the
> encrypted data will only change in blocks >= N. So the problem of losing
> already flushed WAL records should not happen.

Well, this is just a question of alignment.  If WAL records are at
least 16-byte aligned, then it should be fine.  But I have a feeling
they may just be MAXALIGN'd.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Wouldn't Tom's proposal to use a stream cipher fix all this?
> >
> > Yes it would make the extra alignment unnecessary, but our solution tries to
> > meet specific requirements of disk encryption. Stream cipher appears to be
> > incompatible with these requirements:
> >
> > https://en.wikipedia.org/wiki/Disk_encryption_theory
>
> Hmm.  Well, I don't know what to do about that, but I think this patch
> is going to be facing an uphill battle if the encrypted and
> unencrypted WAL formats use different alignment.

Once you said this could be a problem, I thought about it a bit more. The link
above tells that stream cipher is not good for disk encryption because it's
risky to encrypt different data (e.g. old an new version of disk sector) with
the same key (or rather with the same combination of key and initialization
vector).

However WAL is a special case because here we should never need to change the
data at given position. The only exception is that unused space (sequence of
zeroes) becomes a valid record. So if we avoid encryption of the unused space,
it might be o.k. to use the stream cipher for WAL.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Bruce Momjian
Date:
On Sat, Apr  6, 2019 at 03:29:13PM +0200, Antonin Houska wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote:
> > > > Wouldn't Tom's proposal to use a stream cipher fix all this?
> > >
> > > Yes it would make the extra alignment unnecessary, but our solution tries to
> > > meet specific requirements of disk encryption. Stream cipher appears to be
> > > incompatible with these requirements:
> > >
> > > https://en.wikipedia.org/wiki/Disk_encryption_theory
> > 
> > Hmm.  Well, I don't know what to do about that, but I think this patch
> > is going to be facing an uphill battle if the encrypted and
> > unencrypted WAL formats use different alignment.
> 
> Once you said this could be a problem, I thought about it a bit more. The link
> above tells that stream cipher is not good for disk encryption because it's
> risky to encrypt different data (e.g. old an new version of disk sector) with
> the same key (or rather with the same combination of key and initialization
> vector).
> 
> However WAL is a special case because here we should never need to change the
> data at given position. The only exception is that unused space (sequence of
> zeroes) becomes a valid record. So if we avoid encryption of the unused space,
> it might be o.k. to use the stream cipher for WAL.

Agreed.  Requirement #2 is not needed for WAL:

    https://en.wikipedia.org/wiki/Disk_encryption_theory#Problem_definition
    2. Data retrieval and storage should both be fast operations, no
    matter where on the disk the data is stored.

because you are not writing into random parts of the WAL.

Using the same stream cipher system that SSH uses would make sense,
since it is byte-oriented, and it only encrypts the tail.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska <ah@cybertec.at> wrote:
> > I'll enforce the FPW in the next version of the patch.
> 
> Cool.
> 
> I'm willing to put some effort into trying to get this into v13 if
> you're willing to keep hacking on it, but there's probably a fair
> amount to do and a year can go by in a hurry.

Attached is the next version. It tries to address various problems pointed out
upthread, including documentation.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote:
> Attached is the next version. It tries to address various problems pointed out
> upthread, including documentation.

It's not immediately evident to me, perhaps because I haven't looked
at this stuff in a while, what the purpose of 0001 and 0002 is.  I
think it'd be helpful if you would generate these patches with 'git
format-patch' and include a meaningful commit message in each patch of
the series.

+       <para>
+        This setting specifies path to executable file that returns
+        either <literal>encryption_key=</literal> for key
+        and <literal>encryption_password=</literal> for password.
+       </para>

This is incomprehensible.  Executables don't return strings.  Also,
you can return "either a or b" or you can return "both a and b," but
you can't return "either a and b".

-         Build with support for <acronym>SSL</acronym> (encrypted)
-         connections. This requires the <productname>OpenSSL</productname>
-         package to be installed.  <filename>configure</filename> will check
-         for the required header files and libraries to make sure that
+         Build with support for on-disk data encryption
+         or <acronym>SSL</acronym> (encrypted) connections. This requires
+         the <productname>OpenSSL</productname> package to be
+         installed.  <filename>configure</filename> will check for the
+         required header files and libraries to make sure that

I think we should instead just remove the word 'connections'.  We
don't want to have multiple places in the documentation listing all
the things for which OpenSSL is required.

Generally, the documentation changes here have a lot of stuff about
key-or-password, which is quite confusing.  It seems like the idea is
that you accept either a key or a password from which a key is
derived, but I don't think that's explained super-clearly, and I
wonder whether it's unnecessary complexity.  Why not just require a
key always?

Another general point is that the documentation, comments, and README
need some work on the English.  They contain lots of information, but
they read somewhat awkwardly and are hard to understand in some
places.

Wherever you are adding a large hunk of code into the middle of an
existing function (e.g. XLogWrite), please consider putting the logic
in a new function and calling it from the original function, instead
of just inlining the entire thing.

xloginsert.c adds a new #include even though there are no other
changes to that file.  That should not be necessary.  If it is, you've
broken the principle the header files are compilable independently of
other header files (except postgres.h, which is a prerequisite for
every header file).

-XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
- Size count)
+XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
Size count)

Spurious hunk.  Please try to avoid unnecessary changes.

+ * XXX Currently the function is only called with startptr at page
+ * boundary. If it should change, encryption needs to reflect this fact,
+ * i.e. read data from the beginning of the page. Actually this is a
+ * reason to adjust and use walsender.c:XLogRead().

This comment and the similar comment at the end of the function are
not very clear.   It sorta sounds, though, like maybe the decryption
logic should move from here to the caller or something.  You seem to
be saying that, after your changes, this function depends on the
caller using it in a certain way, which is often a sign that you
haven't abstracted things in the right way.  And if walsender.c's
version of the logic is better, why not also do that stuff here?

@@ -52,7 +54,6 @@

 uint32 bootstrap_data_checksum_version = 0; /* No checksum */

-
 #define ALLOC(t, c) \
  ((t *) MemoryContextAllocZero(TopMemoryContext, (unsigned)(c) * sizeof(t)))

Another useless hunk.

+ {
+ RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid};
+ RelFileNode toNode = {dsttablespace, dboid, InvalidOid};
+
+ copydir(srcpath, dstpath, &fromNode, &toNode);
+ }

I suggest instead declaring these variables in the surrounding scope
and initializing them here.  This style is unlike most PostgreSQL
code.

+ case WAIT_EVENT_KDF_FILE_READ:
+ event_name = "KDFFileRead";
+ break;

New wait events need documentation.

+are encrypted differently. Furthermore, as the Postgres page starts with LSN,
+even if only the last encryption block of the page changes, the whole cipher
+text of the page will be different after the next encryption.

But for a hint-bit-only change, this would not be true, unless we
force wal_log_hints.  Sounds like we're doing that, but it might be
worth mentioning it here also.

There's also the question of unlogged tables, which do not have an
LSN.  This logic wouldn't apply there, which might be a problem.

+If the encryption is enabled, the following requirements need to be taken into
+account:
+
+1. The file buffer cannot be positioned at arbitrary offset of the file. If

I think it's no good at all to think about enforcing requirements like
this only when data_encryption is used.  That's just going to lead to
bugs.  I think we need to either find a way to mask this restriction
from the users of the buffile facility, or else revise all those users
so that they respect this new rule and enforce it with an
unconditional assert.

That brings me to another point, which is that a complex patch like
this really needs to be divided up into smaller patches for review.  I
see you've started that a little bit (but see comments on commit
messages above) but you should try to go further with it.  For
example, changing the buffile contract so that there is some new rule
about positioning should be done as a separate patch (or not done at
all).  That patch should be written so that it applies before the main
encryption patch, have comments and asserts explaining the new rules,
etc.

+To store other kinds of data encrypted than the ones above, developers are
+advised to use BufFileWriteTransient() and BufFileReadTransient() functions

The previous paragraph is about BufFileWrite() and BufFileRead(), and
that section is titled "temporary files."  Now here you say that for
other kinds of data (not relations, xlog, or temporary files) we
should use BufFileWriteTransient().  That seems pretty weird, because
data that it is not temporary files would presumably be, uh, permanent
files?  So then why is Transient in the name?  I imagine there are
good reasons here but it needs to be made clearer.

+pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with
+the --link option.)

That seems pretty unfortunate.  Any way to do better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Thanks of another round of review.

Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote:
> > Attached is the next version. It tries to address various problems pointed out
> > upthread, including documentation.
>
> It's not immediately evident to me, perhaps because I haven't looked
> at this stuff in a while, what the purpose of 0001 and 0002 is.  I
> think it'd be helpful if you would generate these patches with 'git
> format-patch' and include a meaningful commit message in each patch of
> the series.

ok, will do.

> +       <para>
> +        This setting specifies path to executable file that returns
> +        either <literal>encryption_key=</literal> for key
> +        and <literal>encryption_password=</literal> for password.
> +       </para>
>
> This is incomprehensible.  Executables don't return strings.

That's right, the key should be written to the standard output.

> Also, you can return "either a or b" or you can return "both a and b," but
> you can't return "either a and b".

Sure, this is a thinko.

> -         Build with support for <acronym>SSL</acronym> (encrypted)
> -         connections. This requires the <productname>OpenSSL</productname>
> -         package to be installed.  <filename>configure</filename> will check
> -         for the required header files and libraries to make sure that
> +         Build with support for on-disk data encryption
> +         or <acronym>SSL</acronym> (encrypted) connections. This requires
> +         the <productname>OpenSSL</productname> package to be
> +         installed.  <filename>configure</filename> will check for the
> +         required header files and libraries to make sure that
>
> I think we should instead just remove the word 'connections'.  We
> don't want to have multiple places in the documentation listing all
> the things for which OpenSSL is required.

ok

> Generally, the documentation changes here have a lot of stuff about
> key-or-password, which is quite confusing.  It seems like the idea is
> that you accept either a key or a password from which a key is
> derived, but I don't think that's explained super-clearly, and I
> wonder whether it's unnecessary complexity.  Why not just require a
> key always?

We thought that it's more convenient for administrator to enter password. To
achieve that, we can instead tell the admin how to use the key derivation tool
(pg_keysetup) and send its output to postgres. So yes, it's possible to always
receive the key from the "encryption key command". I'll change this.

> Another general point is that the documentation, comments, and README
> need some work on the English.  They contain lots of information, but
> they read somewhat awkwardly and are hard to understand in some
> places.

ok, I'll try to explain the tricky things in simple senteces.

> Wherever you are adding a large hunk of code into the middle of an
> existing function (e.g. XLogWrite), please consider putting the logic
> in a new function and calling it from the original function, instead
> of just inlining the entire thing.

ok

> xloginsert.c adds a new #include even though there are no other
> changes to that file.  That should not be necessary.  If it is, you've
> broken the principle the header files are compilable independently of
> other header files (except postgres.h, which is a prerequisite for
> every header file).

It's not intentional, I failed to remove it when moving some encryption
related function calls elsewhere.

> -XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
> - Size count)
> +XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
> Size count)

> Spurious hunk.  Please try to avoid unnecessary changes.

Maybe I added and later removed some argument and forgot to enforce the
maximum line length. I expected pg_indent to fix things like this but it
apparently does not. I'll pay more attention to such things next time.

> + * XXX Currently the function is only called with startptr at page
> + * boundary. If it should change, encryption needs to reflect this fact,
> + * i.e. read data from the beginning of the page. Actually this is a
> + * reason to adjust and use walsender.c:XLogRead().
>
> This comment and the similar comment at the end of the function are
> not very clear.   It sorta sounds, though, like maybe the decryption
> logic should move from here to the caller or something.  You seem to
> be saying that, after your changes, this function depends on the
> caller using it in a certain way, which is often a sign that you
> haven't abstracted things in the right way.  And if walsender.c's
> version of the logic is better, why not also do that stuff here?

Please consider this an incomplete implementation of the XLOG
decryption. Eventually these restrictions will be relaxed and the decryption
will be hidden from caller. Currently there are two implementations of
XLogRead() in the backend (plus one in the frontend) and I didn't want to
spend too much effort on merging the decryption into core until this another
patch gets committed:

https://commitfest.postgresql.org/23/2098/

The current XLogRead() functions are not identical so the decryption specific
code can't be simply copy & pasted.

So far I'll turn this comment into a TODO comment and mention the refactoring
patch there.

> + {
> + RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid};
> + RelFileNode toNode = {dsttablespace, dboid, InvalidOid};
> +
> + copydir(srcpath, dstpath, &fromNode, &toNode);
> + }
>
> I suggest instead declaring these variables in the surrounding scope
> and initializing them here.  This style is unlike most PostgreSQL
> code.

ok

> + case WAIT_EVENT_KDF_FILE_READ:
> + event_name = "KDFFileRead";
> + break;
>
> New wait events need documentation.

ok

> +are encrypted differently. Furthermore, as the Postgres page starts with LSN,
> +even if only the last encryption block of the page changes, the whole cipher
> +text of the page will be different after the next encryption.
>
> But for a hint-bit-only change, this would not be true, unless we
> force wal_log_hints.  Sounds like we're doing that, but it might be
> worth mentioning it here also.

ok

> There's also the question of unlogged tables, which do not have an
> LSN.  This logic wouldn't apply there, which might be a problem.

I missed that. The only idea I have now is that a "fake LSN" is set even for
unlogged table if encryption is enabled. Actually that LSN would only be a
counter that gets incremented each time the page contents has changed.

> +If the encryption is enabled, the following requirements need to be taken into
> +account:
> +
> +1. The file buffer cannot be positioned at arbitrary offset of the file. If
>
> I think it's no good at all to think about enforcing requirements like
> this only when data_encryption is used.  That's just going to lead to
> bugs.  I think we need to either find a way to mask this restriction
> from the users of the buffile facility, or else revise all those users
> so that they respect this new rule and enforce it with an
> unconditional assert.

Sorry, this is misunderstanding. The requirements mentioned in the "Temporary
files" section of the README apply to BufFileWrite() and BufFileRead()
internally. These functions take care of the correct buffer positioning. The
caller does not have to care. I'll try to make this part clearer.

> That brings me to another point, which is that a complex patch like
> this really needs to be divided up into smaller patches for review.  I
> see you've started that a little bit (but see comments on commit
> messages above) but you should try to go further with it.

Maintenance of a long patch series requires additional effort and I was busy
enough with major code rearrangements so far. I'll continue splitting the
stuff into more diffs.

> +To store other kinds of data encrypted than the ones above, developers are
> +advised to use BufFileWriteTransient() and BufFileReadTransient() functions
>
> The previous paragraph is about BufFileWrite() and BufFileRead(), and
> that section is titled "temporary files."  Now here you say that for
> other kinds of data (not relations, xlog, or temporary files) we
> should use BufFileWriteTransient().  That seems pretty weird, because
> data that it is not temporary files would presumably be, uh, permanent
> files?  So then why is Transient in the name?  I imagine there are
> good reasons here but it needs to be made clearer.

"Transient" here means that the file descriptor is kept open for rather short
time, typically until transaction boundary (unlike relation files). However
the file remains on disk even if the descriptor is closed. I believe this is
consistent with the use of OpenTransientFile(). The phrase "other kinds of
data" is probably too generic and more explanation is needed.

> +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with
> +the --link option.)
>
> That seems pretty unfortunate.  Any way to do better?

The reason is that the initialization vector (IV) for relation page encryption
is derived from RelFileNode, and both dbNode and relNode can be different in
the new cluster. Therefore the data of the old cluster need to be decrypted
and encrypted using the new IV before it's copied to the new cluster. That's
why we can't use symlink.

As an alternative, I thought of deriving the IV by hashing the
<database>.<schema>.<object name> string, but that would mean that the
"reencryption" is triggered by commands like "ALTER TABLE ... RENAME TO ...",
"ALTER TABLE ... SET SCHEMA ...", etc.

Currently I'm thinking of making the IV less dependent on RelFileNode
(e.g. generate a random number for which RelFileNode serves as the seed) and
storing it in pg_class as a storage parameter. Therefore we wouldn't have to
pay any extra attention to transfer of the IV to the new cluster. However, the
IV storage parameter would need special treatment:

1. DBA should not be able to remove or change it using ALTER TABLE command
because inability to decrypt the data can be the only outcome.

2. The \d+ command of psql client should not display the IV. Displaying it
probably wouldn't be a security risk, but as we encrypt the whole instance,
the IV would appear for every single table and that might be annoying.

What do you think about this approach?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Tue, May 28, 2019 at 11:27 AM Antonin Houska <ah@cybertec.at> wrote:
> We thought that it's more convenient for administrator to enter password. To
> achieve that, we can instead tell the admin how to use the key derivation tool
> (pg_keysetup) and send its output to postgres. So yes, it's possible to always
> receive the key from the "encryption key command". I'll change this.

Sounds good.  I'm not quite sure how this is going to work.  Ideally
you'd like the encryption key command to fetch the key from something
like ssh-agent, or maybe pop up a window on the user's terminal with a
key prompt.  Just reading from stdin and writing to stdout is not
going to be very convenient...

> Maintenance of a long patch series requires additional effort and I was busy
> enough with major code rearrangements so far. I'll continue splitting the
> stuff into more diffs.

Right, I understand.  Whatever can be split off can be reviewed and
committed separately, which will start to ease the burden of
maintaining the patch set.  I hope.  But getting over the initial hump
can be a lot of work.

> > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with
> > +the --link option.)
> >
> > That seems pretty unfortunate.  Any way to do better?
>
> The reason is that the initialization vector (IV) for relation page encryption
> is derived from RelFileNode, and both dbNode and relNode can be different in
> the new cluster. Therefore the data of the old cluster need to be decrypted
> and encrypted using the new IV before it's copied to the new cluster. That's
> why we can't use symlink.
>
> As an alternative, I thought of deriving the IV by hashing the
> <database>.<schema>.<object name> string, but that would mean that the
> "reencryption" is triggered by commands like "ALTER TABLE ... RENAME TO ...",
> "ALTER TABLE ... SET SCHEMA ...", etc.
>
> Currently I'm thinking of making the IV less dependent on RelFileNode
> (e.g. generate a random number for which RelFileNode serves as the seed) and
> storing it in pg_class as a storage parameter. Therefore we wouldn't have to
> pay any extra attention to transfer of the IV to the new cluster. However, the
> IV storage parameter would need special treatment:
>
> 1. DBA should not be able to remove or change it using ALTER TABLE command
> because inability to decrypt the data can be the only outcome.
>
> 2. The \d+ command of psql client should not display the IV. Displaying it
> probably wouldn't be a security risk, but as we encrypt the whole instance,
> the IV would appear for every single table and that might be annoying.
>
> What do you think about this approach?

I don't think it's a very good idea.  For one thing, you can't store
the IV for pg_class in pg_class; that has a circularity problem.  For
another thing, I'm not sure we can or should try to do catalog access
from every place where an IV might be needed.  In fact, I'd go so far
as to say that sounds like a recipe for a lot of pain and fragility.

One potential approach is to modify pg_upgrade to preserve the dbnode
and relfilenode.  I don't know of any fundamental reason why such a
change shouldn't be possible, although it is probably a bunch of work,
and there may be problems with which I am not acquainted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, May 28, 2019 at 11:27 AM Antonin Houska <ah@cybertec.at> wrote:
> > We thought that it's more convenient for administrator to enter password. To
> > achieve that, we can instead tell the admin how to use the key derivation tool
> > (pg_keysetup) and send its output to postgres. So yes, it's possible to always
> > receive the key from the "encryption key command". I'll change this.
>
> Sounds good.  I'm not quite sure how this is going to work.  Ideally
> you'd like the encryption key command to fetch the key from something
> like ssh-agent, or maybe pop up a window on the user's terminal with a
> key prompt.  Just reading from stdin and writing to stdout is not
> going to be very convenient...

When I mentioned writing to stdout in my previous email, I viewed it from the
perspective of postmaster: whichever way the command gets the password from
the DBA, it'll always write it to stdout and postmaster will read it from
there. This was to address your objection that the executables do not actually
"return" strings.

> > > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with
> > > +the --link option.)
> > >
> > > That seems pretty unfortunate.  Any way to do better?
> >
> >
> > Currently I'm thinking of making the IV less dependent on RelFileNode
> > (e.g. generate a random number for which RelFileNode serves as the seed) and
> > storing it in pg_class as a storage parameter.
>
> I don't think it's a very good idea.  For one thing, you can't store
> the IV for pg_class in pg_class; that has a circularity problem.  For
> another thing, I'm not sure we can or should try to do catalog access
> from every place where an IV might be needed.  In fact, I'd go so far
> as to say that sounds like a recipe for a lot of pain and fragility.
>
> One potential approach is to modify pg_upgrade to preserve the dbnode
> and relfilenode.  I don't know of any fundamental reason why such a
> change shouldn't be possible, although it is probably a bunch of work,
> and there may be problems with which I am not acquainted.

Actually it doesn't seem to be that tricky. In the --binary-upgrade mode,
pg_dump does record pg_class.oid:

-- For binary upgrade, must preserve pg_class oids
SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('57465'::pg_catalog.oid);

The header comment in pg_upgrade.c indicates that this is because of TOAST. So
I think that dbnode and relfilenode are not preserved just because there was
no strong reason to do so by now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote:
> > Sounds good.  I'm not quite sure how this is going to work.  Ideally
> > you'd like the encryption key command to fetch the key from something
> > like ssh-agent, or maybe pop up a window on the user's terminal with a
> > key prompt.  Just reading from stdin and writing to stdout is not
> > going to be very convenient...
>
> When I mentioned writing to stdout in my previous email, I viewed it from the
> perspective of postmaster: whichever way the command gets the password from
> the DBA, it'll always write it to stdout and postmaster will read it from
> there. This was to address your objection that the executables do not actually
> "return" strings.

So the part about returning strings is really just a wording issue:
the documentation needs to talk about data sent to stdout or wherever,
because that's what really happens, and somebody writing such a
command from scratch needs to know what it must do.

What I'm talking about here is that it also has to be reasonably
possible to write an encryption key command that does something
useful.  I don't have a really clear vision for how that's going to
work.  Nobody wants the server, which is probably being launched by
pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
the we need to think about how the prompting is actually going to
work.  One idea is to do it via the FEBE protocol: connect to the
server using libpq, issue a new command that causes the server to
enter COPY mode, and then send the encryption key as COPY data.
However, that idea doesn't really work, because we've got to be able
to get the key before we run recovery or reach consistency, so the
server won't be listening for connections at that point.  Also, we've
got to have a way for this to work in single-user mode, which also
can't listen for connections.  It's probably all fine if your
encryption key command is something like 'curl
https://my.secret.keyserver.me/sekcret.key' because then there's an
external server which you can just go access - but I don't quite
understand how you'd do interactive prompting from here.  Sorry to get
hung up on what may seem like a minor point, but I think it's actually
fairly fundamental: we've got to have a clear vision of how end-users
will really be able to make use of this.

> The header comment in pg_upgrade.c indicates that this is because of TOAST. So
> I think that dbnode and relfilenode are not preserved just because there was
> no strong reason to do so by now.

Maybe you want to propose an independent patch making that change?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Sounds good.  I'm not quite sure how this is going to work.  Ideally
> > > you'd like the encryption key command to fetch the key from something
> > > like ssh-agent, or maybe pop up a window on the user's terminal with a
> > > key prompt.  Just reading from stdin and writing to stdout is not
> > > going to be very convenient...
> >
> > When I mentioned writing to stdout in my previous email, I viewed it from the
> > perspective of postmaster: whichever way the command gets the password from
> > the DBA, it'll always write it to stdout and postmaster will read it from
> > there. This was to address your objection that the executables do not actually
> > "return" strings.
>
> So the part about returning strings is really just a wording issue:
> the documentation needs to talk about data sent to stdout or wherever,
> because that's what really happens, and somebody writing such a
> command from scratch needs to know what it must do.
>
> What I'm talking about here is that it also has to be reasonably
> possible to write an encryption key command that does something
> useful.  I don't have a really clear vision for how that's going to
> work.  Nobody wants the server, which is probably being launched by
> pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
> the we need to think about how the prompting is actually going to
> work.

Since you mentioned ssh-agent above, I think that postmaster can, instead of
running a command, create an unix socket and read the key from there. (I refer
only to the socket as a communication channel, not to the protocol - ssh-agent
does not seem to actually send key over the socket.) Unlike the socket for
backend connections, this one would only be readable by the user that runs
postmaster, and would only exist during the encryption initialization.

The simplest approach from the perspective of the DBA is that pg_ctl can write
the key to the socket. Besides that we can also implement a separate utility
to send the key, to be used in other special cases such as starting postgres
via systemd.

(If the unix socket is a problem on windows, we might need to use named pipe
instead.)

> > The header comment in pg_upgrade.c indicates that this is because of TOAST. So
> > I think that dbnode and relfilenode are not preserved just because there was
> > no strong reason to do so by now.
>
> Maybe you want to propose an independent patch making that change?

I think of a separate diff in the encryption patch series. As the encryption
is the only reason for this enhancement, I'm not sure if it deserves a
separate CF entry. (Although it might be considered refactoring because
eventually pg_upgrade won't need to handle the different relnodes, and thus it
might become a little bit simpler.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Pavlo Golub
Date:
Greetings, Antonin.

You wrote 2019-06-05, 15:32:

> Robert Haas <robertmhaas@gmail.com> wrote:

>> On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote:
>> > > Sounds good.  I'm not quite sure how this is going to work.  Ideally
>> > > you'd like the encryption key command to fetch the key from something
>> > > like ssh-agent, or maybe pop up a window on the user's terminal with a
>> > > key prompt.  Just reading from stdin and writing to stdout is not
>> > > going to be very convenient...
>> >
>> > When I mentioned writing to stdout in my previous email, I viewed it from the
>> > perspective of postmaster: whichever way the command gets the password from
>> > the DBA, it'll always write it to stdout and postmaster will read it from
>> > there. This was to address your objection that the executables do not actually
>> > "return" strings.
>> 
>> So the part about returning strings is really just a wording issue:
>> the documentation needs to talk about data sent to stdout or wherever,
>> because that's what really happens, and somebody writing such a
>> command from scratch needs to know what it must do.
>> 
>> What I'm talking about here is that it also has to be reasonably
>> possible to write an encryption key command that does something
>> useful.  I don't have a really clear vision for how that's going to
>> work.  Nobody wants the server, which is probably being launched by
>> pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
>> the we need to think about how the prompting is actually going to
>> work.

> Since you mentioned ssh-agent above, I think that postmaster can, instead of
> running a command, create an unix socket and read the key from there. (I refer
> only to the socket as a communication channel, not to the protocol - ssh-agent
> does not seem to actually send key over the socket.) Unlike the socket for
> backend connections, this one would only be readable by the user that runs
> postmaster, and would only exist during the encryption initialization.

> The simplest approach from the perspective of the DBA is that pg_ctl can write
> the key to the socket. Besides that we can also implement a separate utility
> to send the key, to be used in other special cases such as starting postgres
> via systemd.

> (If the unix socket is a problem on windows, we might need to use named pipe
> instead.)

Yes, that definitely a problem on Windows. Pipes seems reasonable. Are
mapped files are appropriate from the security point of view?

>> > The header comment in pg_upgrade.c indicates that this is because of TOAST. So
>> > I think that dbnode and relfilenode are not preserved just because there was
>> > no strong reason to do so by now.
>> 
>> Maybe you want to propose an independent patch making that change?

> I think of a separate diff in the encryption patch series. As the encryption
> is the only reason for this enhancement, I'm not sure if it deserves a
> separate CF entry. (Although it might be considered refactoring because
> eventually pg_upgrade won't need to handle the different relnodes, and thus it
> might become a little bit simpler.)




-- 
Kind regards,
 Pavlo                          mailto:pavlo.golub@cybertec.at




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Bruce Momjian
Date:
On Mon, Jun  3, 2019 at 12:04:54PM -0400, Robert Haas wrote:
> 
> What I'm talking about here is that it also has to be reasonably
> possible to write an encryption key command that does something
> useful.  I don't have a really clear vision for how that's going to
> work.  Nobody wants the server, which is probably being launched by
> pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
> the we need to think about how the prompting is actually going to
> work.  One idea is to do it via the FEBE protocol: connect to the
> server using libpq, issue a new command that causes the server to
> enter COPY mode, and then send the encryption key as COPY data.
> However, that idea doesn't really work, because we've got to be able
> to get the key before we run recovery or reach consistency, so the
> server won't be listening for connections at that point.  Also, we've
> got to have a way for this to work in single-user mode, which also
> can't listen for connections.  It's probably all fine if your
> encryption key command is something like 'curl
> https://my.secret.keyserver.me/sekcret.key' because then there's an
> external server which you can just go access - but I don't quite
> understand how you'd do interactive prompting from here.  Sorry to get
> hung up on what may seem like a minor point, but I think it's actually
> fairly fundamental: we've got to have a clear vision of how end-users
> will really be able to make use of this.

pgcryptoey has an example of prompting from /dev/tty:

    http://momjian.us/download/pgcryptokey/

Look at pgcryptokey_default.sample.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Bruce Momjian
Date:
On Thu, Jun 13, 2019 at 09:14:13PM -0400, Bruce Momjian wrote:
> On Mon, Jun  3, 2019 at 12:04:54PM -0400, Robert Haas wrote:
> > 
> > What I'm talking about here is that it also has to be reasonably
> > possible to write an encryption key command that does something
> > useful.  I don't have a really clear vision for how that's going to
> > work.  Nobody wants the server, which is probably being launched by
> > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
> > the we need to think about how the prompting is actually going to
> > work.  One idea is to do it via the FEBE protocol: connect to the
> > server using libpq, issue a new command that causes the server to
> > enter COPY mode, and then send the encryption key as COPY data.
> > However, that idea doesn't really work, because we've got to be able
> > to get the key before we run recovery or reach consistency, so the
> > server won't be listening for connections at that point.  Also, we've
> > got to have a way for this to work in single-user mode, which also
> > can't listen for connections.  It's probably all fine if your
> > encryption key command is something like 'curl
> > https://my.secret.keyserver.me/sekcret.key' because then there's an
> > external server which you can just go access - but I don't quite
> > understand how you'd do interactive prompting from here.  Sorry to get
> > hung up on what may seem like a minor point, but I think it's actually
> > fairly fundamental: we've got to have a clear vision of how end-users
> > will really be able to make use of this.
> 
> pgcryptoey has an example of prompting from /dev/tty:
> 
>     http://momjian.us/download/pgcryptokey/
> 
> Look at pgcryptokey_default.sample.

Also, look at pgcryptokey_default.c and its use with
shared_preload_libraries for a full example of prompting on server boot.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Jun  3, 2019 at 12:04:54PM -0400, Robert Haas wrote:
> > 
> > What I'm talking about here is that it also has to be reasonably
> > possible to write an encryption key command that does something
> > useful.  I don't have a really clear vision for how that's going to
> > work.  Nobody wants the server, which is probably being launched by
> > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
> > the we need to think about how the prompting is actually going to
> > work.  One idea is to do it via the FEBE protocol: connect to the
> > server using libpq, issue a new command that causes the server to
> > enter COPY mode, and then send the encryption key as COPY data.
> > However, that idea doesn't really work, because we've got to be able
> > to get the key before we run recovery or reach consistency, so the
> > server won't be listening for connections at that point.  Also, we've
> > got to have a way for this to work in single-user mode, which also
> > can't listen for connections.  It's probably all fine if your
> > encryption key command is something like 'curl
> > https://my.secret.keyserver.me/sekcret.key' because then there's an
> > external server which you can just go access - but I don't quite
> > understand how you'd do interactive prompting from here.  Sorry to get
> > hung up on what may seem like a minor point, but I think it's actually
> > fairly fundamental: we've got to have a clear vision of how end-users
> > will really be able to make use of this.
> 
> pgcryptoey has an example of prompting from /dev/tty:
> 
>     http://momjian.us/download/pgcryptokey/
> 
> Look at pgcryptokey_default.sample.

It's a nice exercise of shell features but does not seem to be easily
portable, and it has some other restrictions. One is mentioned in a comment:

# Use 'pg_ctl -s' so starting dots do not interfere with password entry

I think that besides the dots, log messages can also be disturbing if logging
collector is not enabled. Another corner case might be there when user runs
pg_ctl with --no-wait option and wants to do run some other command
immediately.

I'm thinking how to teach postmaster to accept FEBE protocol connections
temporarily, just to receive the key. The user applications like pg_ctl,
initdb or pg_upgrade would retrieve the key / password from the DBA, then
start postmaster and send it the key.

Perhaps the message format should be a bit generic so that extensions like
this can use it to receive their keys too.

(The idea of an unix socket or named pipe I proposed upthread is not good
because it's harder to implement in a portable way.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Peter Eisentraut
Date:
On 2019-06-17 11:23, Antonin Houska wrote:
> I'm thinking how to teach postmaster to accept FEBE protocol connections
> temporarily, just to receive the key. The user applications like pg_ctl,
> initdb or pg_upgrade would retrieve the key / password from the DBA, then
> start postmaster and send it the key.
> 
> Perhaps the message format should be a bit generic so that extensions like
> this can use it to receive their keys too.
> 
> (The idea of an unix socket or named pipe I proposed upthread is not good
> because it's harder to implement in a portable way.)

How are the requirements here different from ssl_passphrase_command?
Why do we need a new mechanism?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Robert Haas
Date:
On Tue, Jun 25, 2019 at 8:28 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> How are the requirements here different from ssl_passphrase_command?
> Why do we need a new mechanism?

I don't think that the requirements are different, and I don't think
we need a new mechanism.

I am curious exactly how you would set up ssl_passphrase_command to
prompt interactively.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Peter Eisentraut
Date:
On 2019-06-25 15:39, Robert Haas wrote:
> On Tue, Jun 25, 2019 at 8:28 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> How are the requirements here different from ssl_passphrase_command?
>> Why do we need a new mechanism?
> 
> I don't think that the requirements are different, and I don't think
> we need a new mechanism.
> 
> I am curious exactly how you would set up ssl_passphrase_command to
> prompt interactively.

For example like this:
https://www.2ndquadrant.com/en/blog/postgresql-passphrase-protected-ssl-keys-systemd/

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Antonin Houska
Date:
Antonin Houska <ah@cybertec.at> wrote:

> Thanks of another round of review.
> 
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Attached is the next version. It tries to address various problems pointed out
> > > upthread, including documentation.

This is the next version.

I've reworked the way key is passed to postmaster (v04-0003-...) so it's
easier to call the encryption key command interactively, adjusted pg_upgrade
so that it preserves database, tablespace and relfilenode OIDs (v04-0014-...),
reorganized the code a bit and split the code into more diffs.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Bruce Momjian
Date:
On Tue, Jun 25, 2019 at 02:28:00PM +0200, Peter Eisentraut wrote:
> On 2019-06-17 11:23, Antonin Houska wrote:
> > I'm thinking how to teach postmaster to accept FEBE protocol connections
> > temporarily, just to receive the key. The user applications like pg_ctl,
> > initdb or pg_upgrade would retrieve the key / password from the DBA, then
> > start postmaster and send it the key.
> > 
> > Perhaps the message format should be a bit generic so that extensions like
> > this can use it to receive their keys too.
> > 
> > (The idea of an unix socket or named pipe I proposed upthread is not good
> > because it's harder to implement in a portable way.)
> 
> How are the requirements here different from ssl_passphrase_command?
> Why do we need a new mechanism?

Agreed.  My pgcryptokey prompting shell script was mostly a
proof-of-concept.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

From
Thomas Munro
Date:
On Sat, Jul 6, 2019 at 2:42 AM Antonin Houska <ah@cybertec.at> wrote:
> I've reworked the way key is passed to postmaster (v04-0003-...) so it's
> easier to call the encryption key command interactively, adjusted pg_upgrade
> so that it preserves database, tablespace and relfilenode OIDs (v04-0014-...),
> reorganized the code a bit and split the code into more diffs.

Hi Antonin,

Some robotic feedback:

1.  On Linux, an assertion failed:

ExceptionalCondition (conditionName=conditionName@entry=0x973891
"!(decrypt_p == p)", errorType=errorType@entry=0x928d7d
"FailedAssertion", fileName=fileName@entry=0x973827 "xlogutils.c",
lineNumber=lineNumber@entry=815) at assert.c:54

See full stack here:
https://travis-ci.org/postgresql-cfbot/postgresql/builds/555550833

2.  On Windows the build failed:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46469

-- 
Thomas Munro
https://enterprisedb.com



Re: WIP: Data at rest encryption

From
Shawn Wang
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            not tested

Hi Antonin,
It is very glad to see the new patch. I used the public patches a long time ago.
I did some tests like the stream replication, much data running, temporary files encryption.
I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size =
EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING.
 
There is some problem to merge your patches to the latest kernel in the pg_ctl.c.

Regards,

--
Shawn Wang

Re: WIP: Data at rest encryption

From
Alvaro Herrera
Date:
On 2019-Aug-02, Shawn Wang wrote:

> Hi Antonin,
> It is very glad to see the new patch. I used the public patches a long time ago.
> I did some tests like the stream replication, much data running, temporary files encryption.
> I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size =
EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING.
 
> There is some problem to merge your patches to the latest kernel in the pg_ctl.c.

Is a new, fixed version going to be posted soon?  It's been a while.

Also, apologies if this has been asked before, but: how does this patch
relate to the stuff being discussed in
https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Data at rest encryption

From
Ibrar Ahmed
Date:


On Tue, Sep 3, 2019 at 10:21 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Aug-02, Shawn Wang wrote:

> Hi Antonin,
> It is very glad to see the new patch. I used the public patches a long time ago.
> I did some tests like the stream replication, much data running, temporary files encryption.
> I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef USE_ASSERT_CHECKING.
> There is some problem to merge your patches to the latest kernel in the pg_ctl.c.

Is a new, fixed version going to be posted soon?  It's been a while.

Also, apologies if this has been asked before, but: how does this patch
relate to the stuff being discussed in
https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ?

Yes, it is related to that, but we don't have that on our agenda in a weekly meeting.  It has
many conflicting points about what we are doing. Swada / Bruce can comment on that.
  
--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




--
Ibrar Ahmed

Re: WIP: Data at rest encryption

From
Antonin Houska
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2019-Aug-02, Shawn Wang wrote:
>
> > Hi Antonin,
> > It is very glad to see the new patch. I used the public patches a long time ago.
> > I did some tests like the stream replication, much data running, temporary files encryption.
> > I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size =
EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING. 
> > There is some problem to merge your patches to the latest kernel in the pg_ctl.c.
>
> Is a new, fixed version going to be posted soon?  It's been a while.
>
> Also, apologies if this has been asked before, but: how does this patch
> relate to the stuff being discussed in
> https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ?

This thread started later than our effort but important design questions are
being discussed there. So far there seems to be no consensus whether
full-instance encryption should be implemented first, so any effort spent on
this patch might get wasted. When/if there will be an agreement on the design,
we'll see how much of this patch can be used.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: WIP: Data at rest encryption

From
Shawn Wang
Date:
---- On Wed, 04 Sep 2019 00:56:15 +0800 Alvaro Herrera <alvherre@2ndquadrant.com> wrote ----

On 2019-Aug-02, Shawn Wang wrote:

> Hi Antonin,
> It is very glad to see the new patch. I used the public patches a long time ago.
> I did some tests like the stream replication, much data running, temporary files encryption.
> I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef USE_ASSERT_CHECKING.
> There is some problem to merge your patches to the latest kernel in the pg_ctl.c.

Is a new, fixed version going to be posted soon? It's been a while. 

Also, apologies if this has been asked before, but: how does this patch
relate to the stuff being discussed in


Hi Álvaro,

Thank you for a reply.

I mainly said that the issues in the src/backend/storage/file/encryption.c. If somebody want to use these patches, I think Antonin need to fix it.
It does not relate to the stuff being discussed in TDE. As I know, some company use these patches to encrypt data, even if these issues don't matter.

Regards,
--
Shawn Wang

Re: WIP: Data at rest encryption

From
Michael Paquier
Date:
On Wed, Sep 04, 2019 at 06:56:18AM +0200, Antonin Houska wrote:
> This thread started later than our effort but important design questions are
> being discussed there. So far there seems to be no consensus whether
> full-instance encryption should be implemented first, so any effort spent on
> this patch might get wasted. When/if there will be an agreement on the design,
> we'll see how much of this patch can be used.

I see.  I have marked the patch as returned with feedback in CF
2019-11.  Let's see how the other one finishes, before perhaps coming
back to this one.
--
Michael

Attachment