Thread: Transparent Data Encryption (TDE) and encrypted files

Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
For full-cluster Transparent Data Encryption (TDE), the current plan is
to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
overflow).  The plan is:

    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
other files.  Is that correct?  Do any other PGDATA files contain user
data?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tels
Date:
Moin,

On 2019-09-30 23:26, Bruce Momjian wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
> 
>     https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> 
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, 
> or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

IMHO the general rule in crypto is: encrypt everything, or don't bother.

If you don't encrypt some things, somebody is going to find loopholes 
and sidechannels
and partial-plaintext attacks. Just a silly example: If you trick the DB 
into putting only one row per page,
any "bit-per-page" map suddenly reveals information about a single 
encrypted row that it shouldn't reveal.

Many people with a lot of free time on their hands will sit around, 
drink a nice cup of tea and come up
with all sorts of attacks on these things that you didn't (and couldn't) 
anticipate now.

So IMHO it would be much better to err on the side of caution and 
encrypt everything possible.

Best regards,

Tels



Re: Transparent Data Encryption (TDE) and encrypted files

From
"Moon, Insung"
Date:
Dear Tels.

On Tue, Oct 1, 2019 at 4:33 PM Tels <nospam-pg-abuse@bloodgate.com> wrote:
>
> Moin,
>
> On 2019-09-30 23:26, Bruce Momjian wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> >       https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> > or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
>
> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>
> If you don't encrypt some things, somebody is going to find loopholes
> and sidechannels
> and partial-plaintext attacks. Just a silly example: If you trick the DB
> into putting only one row per page,
> any "bit-per-page" map suddenly reveals information about a single
> encrypted row that it shouldn't reveal.
>
> Many people with a lot of free time on their hands will sit around,
> drink a nice cup of tea and come up
> with all sorts of attacks on these things that you didn't (and couldn't)
> anticipate now.

This is my thinks, but to minimize overhead, we try not to encrypt
data that does not store confidential data.

And I'm not a security expert, so my thoughts may be wrong.
But isn't it more dangerous to encrypt predictable data?

For example, when encrypting data other than the data entered by the user,
it is possible(maybe..) to predict the plain text data.
And if these data are encrypted, I think that there will be a security problem.

Of course, the encryption key will use separately.
But I thought it would be a problem if there were confidential data
encrypted using the same key as the attacked data.

Best regards.
Moon.


>
> So IMHO it would be much better to err on the side of caution and
> encrypt everything possible.
>
> Best regards,
>
> Tels
>
>



Re: Transparent Data Encryption (TDE) and encrypted files

From
Magnus Hagander
Date:


On Tue, Oct 1, 2019 at 9:33 AM Tels <nospam-pg-abuse@bloodgate.com> wrote:
Moin,

On 2019-09-30 23:26, Bruce Momjian wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
>       https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

IMHO the general rule in crypto is: encrypt everything, or don't bother.

If you don't encrypt some things, somebody is going to find loopholes
and sidechannels
and partial-plaintext attacks. Just a silly example: If you trick the DB
into putting only one row per page,
any "bit-per-page" map suddenly reveals information about a single
encrypted row that it shouldn't reveal.

Many people with a lot of free time on their hands will sit around,
drink a nice cup of tea and come up
with all sorts of attacks on these things that you didn't (and couldn't)
anticipate now.

So IMHO it would be much better to err on the side of caution and
encrypt everything possible.

+1.

Unless we are *absolutely* certain, I bet someone will be able to find a side-channel that somehow leaks some data or data-about-data, if we don't encrypt everything. If nothing else, you can get use patterns out of it, and you can make a lot from that. (E.g. by whether transactions are using multixacts or not you can potentially determine which transaction they are, if you know what type of transactions are being issued by the application. In the simplest case, there might be a single pattern where multixacts end up actually being used, and in that case being able to see the multixact data tells you a lot about the system).

As for other things -- by default, we store the log files in text format in the data directory. That contains *loads* of sensitive data in a lot of cases. Will those also be encrypted? 

--

Re: Transparent Data Encryption (TDE) and encrypted files

From
"Moon, Insung"
Date:
Dear  Magnus Hagander.

On Tue, Oct 1, 2019 at 5:37 PM Magnus Hagander <magnus@hagander.net> wrote:
>
>
>
> On Tue, Oct 1, 2019 at 9:33 AM Tels <nospam-pg-abuse@bloodgate.com> wrote:
>>
>> Moin,
>>
>> On 2019-09-30 23:26, Bruce Momjian wrote:
>> > For full-cluster Transparent Data Encryption (TDE), the current plan is
>> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
>> > overflow).  The plan is:
>> >
>> >       https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>> >
>> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
>> > or
>> > other files.  Is that correct?  Do any other PGDATA files contain user
>> > data?
>>
>> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>>
>> If you don't encrypt some things, somebody is going to find loopholes
>> and sidechannels
>> and partial-plaintext attacks. Just a silly example: If you trick the DB
>> into putting only one row per page,
>> any "bit-per-page" map suddenly reveals information about a single
>> encrypted row that it shouldn't reveal.
>>
>> Many people with a lot of free time on their hands will sit around,
>> drink a nice cup of tea and come up
>> with all sorts of attacks on these things that you didn't (and couldn't)
>> anticipate now.
>>
>> So IMHO it would be much better to err on the side of caution and
>> encrypt everything possible.
>
>
> +1.
>
> Unless we are *absolutely* certain, I bet someone will be able to find a side-channel that somehow leaks some data or
data-about-data,if we don't encrypt everything. If nothing else, you can get use patterns out of it, and you can make a
lotfrom that. (E.g. by whether transactions are using multixacts or not you can potentially determine which transaction
theyare, if you know what type of transactions are being issued by the application. In the simplest case, there might
bea single pattern where multixacts end up actually being used, and in that case being able to see the multixact data
tellsyou a lot about the system). 
>
> As for other things -- by default, we store the log files in text format in the data directory. That contains *loads*
ofsensitive data in a lot of cases. Will those also be encrypted? 


Maybe...as a result of the discussion so far, we are not encrypted of
the server log.

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#What_to_encrypt.2Fdecrypt

I think Encrypting server logs can be a very difficult challenge,
and will probably need to develop another application to see the
encrypted server logs.

Best regards.
Moon.


>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Tue, Oct 01, 2019 at 06:30:39PM +0900, Moon, Insung wrote:
>Dear  Magnus Hagander.
>
>On Tue, Oct 1, 2019 at 5:37 PM Magnus Hagander <magnus@hagander.net> wrote:
>>
>>
>>
>> On Tue, Oct 1, 2019 at 9:33 AM Tels <nospam-pg-abuse@bloodgate.com> wrote:
>>>
>>> Moin,
>>>
>>> On 2019-09-30 23:26, Bruce Momjian wrote:
>>> > For full-cluster Transparent Data Encryption (TDE), the current plan is
>>> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
>>> > overflow).  The plan is:
>>> >
>>> >       https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>>> >
>>> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
>>> > or
>>> > other files.  Is that correct?  Do any other PGDATA files contain user
>>> > data?
>>>
>>> IMHO the general rule in crypto is: encrypt everything, or don't bother.
>>>
>>> If you don't encrypt some things, somebody is going to find loopholes
>>> and sidechannels
>>> and partial-plaintext attacks. Just a silly example: If you trick the DB
>>> into putting only one row per page,
>>> any "bit-per-page" map suddenly reveals information about a single
>>> encrypted row that it shouldn't reveal.
>>>
>>> Many people with a lot of free time on their hands will sit around,
>>> drink a nice cup of tea and come up
>>> with all sorts of attacks on these things that you didn't (and couldn't)
>>> anticipate now.
>>>
>>> So IMHO it would be much better to err on the side of caution and
>>> encrypt everything possible.
>>
>>
>> +1.
>>
>> Unless we are *absolutely* certain, I bet someone will be able to find a side-channel that somehow leaks some data
ordata-about-data, if we don't encrypt everything. If nothing else, you can get use patterns out of it, and you can
makea lot from that. (E.g. by whether transactions are using multixacts or not you can potentially determine which
transactionthey are, if you know what type of transactions are being issued by the application. In the simplest case,
theremight be a single pattern where multixacts end up actually being used, and in that case being able to see the
multixactdata tells you a lot about the system).
 
>>
>> As for other things -- by default, we store the log files in text format in the data directory. That contains
*loads*of sensitive data in a lot of cases. Will those also be encrypted?
 
>
>
>Maybe...as a result of the discussion so far, we are not encrypted of
>the server log.
>
>https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#What_to_encrypt.2Fdecrypt
>
>I think Encrypting server logs can be a very difficult challenge,
>and will probably need to develop another application to see the
>encrypted server logs.
>

IMO leaks of sensitive data into the server log (say, as part of error
messages, slow queries, ...) are a serious issue. It's one of the main
issues with pgcrypto-style encryption, because it's trivial to leak e.g.
keys into the server log. Even if proper key management prevents leaking
keys, there are still user data - say, credit card numbers, and such.

So I don't see how we could not encrypt the server log, in the end.

But yes, you're right it's a challenging topis.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Tue, Oct  1, 2019 at 03:48:31PM +0200, Tomas Vondra wrote:
> IMO leaks of sensitive data into the server log (say, as part of error
> messages, slow queries, ...) are a serious issue. It's one of the main
> issues with pgcrypto-style encryption, because it's trivial to leak e.g.
> keys into the server log. Even if proper key management prevents leaking
> keys, there are still user data - say, credit card numbers, and such.

Fortunately, the full-cluster encryption keys are stored encrypted in
pg_control and are never accessible unencrypted at the SQL level.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Sep 30, 2019 at 05:26:33PM -0400, Bruce Momjian wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
> 
>     https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> 
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

Oh, there is also consideration that the pg_replslot directory might
also contain user data.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <bruce@momjian.us> wrote:
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
>         https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?

As others have said, that sounds wrong to me.  I think you need to
encrypt everything.

I'm not sold on the comments that have been made about encrypting the
server log. I agree that could leak data, but that seems like somebody
else's problem: the log files aren't really under PostgreSQL's
management in the same way as pg_clog is. If you want to secure your
logs, send them to syslog and configure it to do whatever you need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <bruce@momjian.us> wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> >         https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
>
> As others have said, that sounds wrong to me.  I think you need to
> encrypt everything.

That isn't what other database systems do though and isn't what people
actually asking for this feature are expecting to have or deal with.

People who are looking for 'encrypt all the things' should and will be
looking at filesytem-level encryption options.  That's not what this
feature is about.

> I'm not sold on the comments that have been made about encrypting the
> server log. I agree that could leak data, but that seems like somebody
> else's problem: the log files aren't really under PostgreSQL's
> management in the same way as pg_clog is. If you want to secure your
> logs, send them to syslog and configure it to do whatever you need.

I agree with this.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
>Greetings,
>
>* Robert Haas (robertmhaas@gmail.com) wrote:
>> On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <bruce@momjian.us> wrote:
>> > For full-cluster Transparent Data Encryption (TDE), the current plan is
>> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
>> > overflow).  The plan is:
>> >
>> >         https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>> >
>> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
>> > other files.  Is that correct?  Do any other PGDATA files contain user
>> > data?
>>
>> As others have said, that sounds wrong to me.  I think you need to
>> encrypt everything.
>
>That isn't what other database systems do though and isn't what people
>actually asking for this feature are expecting to have or deal with.
>
>People who are looking for 'encrypt all the things' should and will be
>looking at filesytem-level encryption options.  That's not what this
>feature is about.
>

That's almost certainly not true, at least not universally.

It may be true for some people, but a a lot of the people asking for
in-database encryption essentially want to do filesystem encryption but
can't use it for various reasons. E.g. because they're running in
environments that make filesystem encryption impossible to use (OS not
supporting it directly, no access to the block device, lack of admin
privileges, ...). Or maybe they worry about people with fs access.

If you look at how the two threads discussing the FDE design, both of
them pretty much started as "let's do FDE in the database".

>> I'm not sold on the comments that have been made about encrypting the
>> server log. I agree that could leak data, but that seems like somebody
>> else's problem: the log files aren't really under PostgreSQL's
>> management in the same way as pg_clog is. If you want to secure your
>> logs, send them to syslog and configure it to do whatever you need.
>
>I agree with this.
>

I don't. I know it's not an easy problem to solve, but it may contain
user data (which is what we manage). We may allow disabling that, at
which point it becomes someone else's problem.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
> >People who are looking for 'encrypt all the things' should and will be
> >looking at filesytem-level encryption options.  That's not what this
> >feature is about.
>
> That's almost certainly not true, at least not universally.
>
> It may be true for some people, but a a lot of the people asking for
> in-database encryption essentially want to do filesystem encryption but
> can't use it for various reasons. E.g. because they're running in
> environments that make filesystem encryption impossible to use (OS not
> supporting it directly, no access to the block device, lack of admin
> privileges, ...). Or maybe they worry about people with fs access.

Anyone coming from other database systems isn't asking for that though
and it wouldn't be a comparable offering to other systems.

> If you look at how the two threads discussing the FDE design, both of
> them pretty much started as "let's do FDE in the database".

And that's how some folks continue to see it- let's just encrypt all the
things, until they actually look at it and start thinking about what
that means and how to implement it.

Yeah, it'd be great to just encrypt everything, with a bunch of
different keys, all of which are stored somewhere else, and can be
updated and changed by the user when they need to do a rekeying, but
then you start have to asking about what keys need to be available when
for doing crash recovery, how do you handle a crash in the middle of a
rekeying, how do you handle updating keys from the user, etc..

Sure, we could offer a dead simple "here, use this one key at database
start to just encrypt everything" and that would be enough for some set
of users (a very small set, imv, but that's subjective, obviously), but
I don't think we could dare promote that as having TDE because it
wouldn't be at all comparable to what other databases have, and it
wouldn't materially move us in the direction of having real TDE.

> >>I'm not sold on the comments that have been made about encrypting the
> >>server log. I agree that could leak data, but that seems like somebody
> >>else's problem: the log files aren't really under PostgreSQL's
> >>management in the same way as pg_clog is. If you want to secure your
> >>logs, send them to syslog and configure it to do whatever you need.
> >
> >I agree with this.
>
> I don't. I know it's not an easy problem to solve, but it may contain
> user data (which is what we manage). We may allow disabling that, at
> which point it becomes someone else's problem.

We also send user data to clients, but I don't imagine we're suggesting
that we need to control what some downstream application does with that
data or how it gets stored.  There's definitely a lot of room for
improvement in our logging (in an ideal world, we'd have a way to
actually store the logs in the database, at which point it could be
encrypted or not that way...), but I'm not seeing the need for us to
have a way to encrypt the log files.  If we did encrypt them, we'd have
to make sure to do it in a way that users could still access them
without the database being up and running, which might be tricky if the
key is in the vault...

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Peter Eisentraut
Date:
On 2019-10-03 16:40, Stephen Frost wrote:
>> As others have said, that sounds wrong to me.  I think you need to
>> encrypt everything.
> That isn't what other database systems do though and isn't what people
> actually asking for this feature are expecting to have or deal with.

It is what some other database systems do.  Perhaps some others don't.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 2019-10-03 16:40, Stephen Frost wrote:
> >> As others have said, that sounds wrong to me.  I think you need to
> >> encrypt everything.
> > That isn't what other database systems do though and isn't what people
> > actually asking for this feature are expecting to have or deal with.
>
> It is what some other database systems do.  Perhaps some others don't.

I looked at the contemporary databases and provided details about all of
them earlier in the thread.  Please feel free to review that and let me
know if your research shows differently.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
>> >People who are looking for 'encrypt all the things' should and will be
>> >looking at filesytem-level encryption options.  That's not what this
>> >feature is about.
>>
>> That's almost certainly not true, at least not universally.
>>
>> It may be true for some people, but a a lot of the people asking for
>> in-database encryption essentially want to do filesystem encryption but
>> can't use it for various reasons. E.g. because they're running in
>> environments that make filesystem encryption impossible to use (OS not
>> supporting it directly, no access to the block device, lack of admin
>> privileges, ...). Or maybe they worry about people with fs access.
>
>Anyone coming from other database systems isn't asking for that though
>and it wouldn't be a comparable offering to other systems.
>

I don't think that's quite accurate. In the previous message you claimed
(1) this isn't what other database systems do and (2) people who want to
encrypt everything should just use fs encryption, because that's not
what TDE is about.

Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
at least in the mode with tablespace-level encryption. It's true there
is also column-level mode, but from my experience it's far less used
because it has a number of annoying limitations.

So I'm somewhat puzzled by your claim that people coming from other
systems are asking for the column-level mode. At least I'm assuming
that's what they're asking for, because I don't see other options.

>> If you look at how the two threads discussing the FDE design, both of
>> them pretty much started as "let's do FDE in the database".
>
>And that's how some folks continue to see it- let's just encrypt all the
>things, until they actually look at it and start thinking about what
>that means and how to implement it.
>

This argument also works the other way, though. On Oracle, people often
start with the column-level encryption because it seems naturally
superior (hey, I can encrypt just the columns I want, ...) and then they
start running into the various limitations and eventually just switch to
the tablespace-level encryption.

Now, maybe we'll be able to solve those limitations - but I think it's
pretty unlikely, because those limitations seem quite inherent to how
encryption affects indexes etc.

>Yeah, it'd be great to just encrypt everything, with a bunch of
>different keys, all of which are stored somewhere else, and can be
>updated and changed by the user when they need to do a rekeying, but
>then you start have to asking about what keys need to be available when
>for doing crash recovery, how do you handle a crash in the middle of a
>rekeying, how do you handle updating keys from the user, etc..
>
>Sure, we could offer a dead simple "here, use this one key at database
>start to just encrypt everything" and that would be enough for some set
>of users (a very small set, imv, but that's subjective, obviously), but
>I don't think we could dare promote that as having TDE because it
>wouldn't be at all comparable to what other databases have, and it
>wouldn't materially move us in the direction of having real TDE.
>

I think that very much depends on the definition of what "real TDE".  I
don't know what exactly that means at this point. And as I said before,
I think such simple mode *is* comparable to (at least some) solutions
available in other databases (as explained above).

As for the users, I don't have any objective data about this, but I
think the amount of people wanting such simple solution is non-trivial.
That does not mean we can't extend it to support more advanced features.

>> >>I'm not sold on the comments that have been made about encrypting the
>> >>server log. I agree that could leak data, but that seems like somebody
>> >>else's problem: the log files aren't really under PostgreSQL's
>> >>management in the same way as pg_clog is. If you want to secure your
>> >>logs, send them to syslog and configure it to do whatever you need.
>> >
>> >I agree with this.
>>
>> I don't. I know it's not an easy problem to solve, but it may contain
>> user data (which is what we manage). We may allow disabling that, at
>> which point it becomes someone else's problem.
>
>We also send user data to clients, but I don't imagine we're suggesting
>that we need to control what some downstream application does with that
>data or how it gets stored.  There's definitely a lot of room for
>improvement in our logging (in an ideal world, we'd have a way to
>actually store the logs in the database, at which point it could be
>encrypted or not that way...), but I'm not seeing the need for us to
>have a way to encrypt the log files.  If we did encrypt them, we'd have
>to make sure to do it in a way that users could still access them
>without the database being up and running, which might be tricky if the
>key is in the vault...
>

That's a bit of a straw-man argument, really. The client is obviously
meant to receive and handle sensitive data, that's it's main purpose.
For logging systems the situation is a bit different, it's a general
purpose tool, with no idea what the data is.

I do understand it's pretty pointless to send encrypted message to such
external tools, but IMO it's be good to implement that at least for our
internal logging collector.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Thu, Oct 03, 2019 at 11:58:55AM -0400, Stephen Frost wrote:
>Greetings,
>
>* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
>> On 2019-10-03 16:40, Stephen Frost wrote:
>> >> As others have said, that sounds wrong to me.  I think you need to
>> >> encrypt everything.
>> > That isn't what other database systems do though and isn't what people
>> > actually asking for this feature are expecting to have or deal with.
>>
>> It is what some other database systems do.  Perhaps some others don't.
>
>I looked at the contemporary databases and provided details about all of
>them earlier in the thread.  Please feel free to review that and let me
>know if your research shows differently.
>

I assume you mean this (in one of the other threads):

https://www.postgresql.org/message-id/20190817175217.GE16436%40tamriel.snowman.net

FWIW I don't see anything contradicting the idea of just encrypting
everything (including vm, fsm etc.). The only case that seems to be an
exception is the column-level encryption in Oracle, all the other
options (especially the database-level ones) seem to be consistent with
this principle.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:
> >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >>On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
> >>>People who are looking for 'encrypt all the things' should and will be
> >>>looking at filesytem-level encryption options.  That's not what this
> >>>feature is about.
> >>
> >>That's almost certainly not true, at least not universally.
> >>
> >>It may be true for some people, but a a lot of the people asking for
> >>in-database encryption essentially want to do filesystem encryption but
> >>can't use it for various reasons. E.g. because they're running in
> >>environments that make filesystem encryption impossible to use (OS not
> >>supporting it directly, no access to the block device, lack of admin
> >>privileges, ...). Or maybe they worry about people with fs access.
> >
> >Anyone coming from other database systems isn't asking for that though
> >and it wouldn't be a comparable offering to other systems.
>
> I don't think that's quite accurate. In the previous message you claimed
> (1) this isn't what other database systems do and (2) people who want to
> encrypt everything should just use fs encryption, because that's not
> what TDE is about.
>
> Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
> at least in the mode with tablespace-level encryption. It's true there
> is also column-level mode, but from my experience it's far less used
> because it has a number of annoying limitations.

We're probably being too general and that's ending up with us talking
past each other.  Yes, Oracle provides tablespace and column level
encryption, but neither case results in *everything* being encrypted.

> So I'm somewhat puzzled by your claim that people coming from other
> systems are asking for the column-level mode. At least I'm assuming
> that's what they're asking for, because I don't see other options.

I've seen asks for tablespace, table, and column-level, but it's always
been about the actual data.  Something like clog is an entirely internal
structure that doesn't include the actual data.  Yes, it's possible it
could somehow be used for a side-channel attack, as could other things,
such as WAL, and as such I'm not sure that forcing a policy of "encrypt
everything" is actually a sensible approach and it definitely adds
complexity and makes it a lot more difficult to come up with a sensible
solution.

> >>If you look at how the two threads discussing the FDE design, both of
> >>them pretty much started as "let's do FDE in the database".
> >
> >And that's how some folks continue to see it- let's just encrypt all the
> >things, until they actually look at it and start thinking about what
> >that means and how to implement it.
>
> This argument also works the other way, though. On Oracle, people often
> start with the column-level encryption because it seems naturally
> superior (hey, I can encrypt just the columns I want, ...) and then they
> start running into the various limitations and eventually just switch to
> the tablespace-level encryption.
>
> Now, maybe we'll be able to solve those limitations - but I think it's
> pretty unlikely, because those limitations seem quite inherent to how
> encryption affects indexes etc.

It would probably be useful to discuss the specific limitations that
you've seen causes people to move away from column-level encryption.

I definitely agree that figuring out how to make things work with
indexes is a non-trivial challenge, though I'm hopeful that we can come
up with something sensible.

> >Yeah, it'd be great to just encrypt everything, with a bunch of
> >different keys, all of which are stored somewhere else, and can be
> >updated and changed by the user when they need to do a rekeying, but
> >then you start have to asking about what keys need to be available when
> >for doing crash recovery, how do you handle a crash in the middle of a
> >rekeying, how do you handle updating keys from the user, etc..
> >
> >Sure, we could offer a dead simple "here, use this one key at database
> >start to just encrypt everything" and that would be enough for some set
> >of users (a very small set, imv, but that's subjective, obviously), but
> >I don't think we could dare promote that as having TDE because it
> >wouldn't be at all comparable to what other databases have, and it
> >wouldn't materially move us in the direction of having real TDE.
>
> I think that very much depends on the definition of what "real TDE".  I
> don't know what exactly that means at this point. And as I said before,
> I think such simple mode *is* comparable to (at least some) solutions
> available in other databases (as explained above).

When I was researching this, I couldn't find any example of a database
that wouldn't start without the one magic key that encrypts everything.
I'm happy to be told that I was wrong in my understanding of that, with
some examples.

> As for the users, I don't have any objective data about this, but I
> think the amount of people wanting such simple solution is non-trivial.
> That does not mean we can't extend it to support more advanced features.

The concern that I raised before and that I continue to worry about is
that providing such a simple capability will have a lot of limitations
too (such as having a single key and only being able to rekey during a
complete downtime, because we have to re-encrypt clog, etc, etc), and
I don't see it helping us get to more granular TDE because, for that,
where we really need to start is by building a vault of some kind to
store the keys in and then figuring out how we do things like crash
recovery in a sensible way and, ideally, without needing to have access
to all of (any of?) the keys.

> >>>>I'm not sold on the comments that have been made about encrypting the
> >>>>server log. I agree that could leak data, but that seems like somebody
> >>>>else's problem: the log files aren't really under PostgreSQL's
> >>>>management in the same way as pg_clog is. If you want to secure your
> >>>>logs, send them to syslog and configure it to do whatever you need.
> >>>
> >>>I agree with this.
> >>
> >>I don't. I know it's not an easy problem to solve, but it may contain
> >>user data (which is what we manage). We may allow disabling that, at
> >>which point it becomes someone else's problem.
> >
> >We also send user data to clients, but I don't imagine we're suggesting
> >that we need to control what some downstream application does with that
> >data or how it gets stored.  There's definitely a lot of room for
> >improvement in our logging (in an ideal world, we'd have a way to
> >actually store the logs in the database, at which point it could be
> >encrypted or not that way...), but I'm not seeing the need for us to
> >have a way to encrypt the log files.  If we did encrypt them, we'd have
> >to make sure to do it in a way that users could still access them
> >without the database being up and running, which might be tricky if the
> >key is in the vault...
>
> That's a bit of a straw-man argument, really. The client is obviously
> meant to receive and handle sensitive data, that's it's main purpose.
> For logging systems the situation is a bit different, it's a general
> purpose tool, with no idea what the data is.

The argument you're making is that the log isn't intended to have
sensitive data, but while that might be a nice place to get to, we
certainly aren't there today, which means that people should really be
sending the logs to a location that's trusted.

> I do understand it's pretty pointless to send encrypted message to such
> external tools, but IMO it's be good to implement that at least for our
> internal logging collector.

It's also less than user friendly to log to encrypted files that you
can't read without having the database system being up, so we'd have to
figure out at least a solution to that problem, and then if you have
downstream systems where the logs are going to, you have to decrypt
them, or have a way to have them not be encrypted perhaps.

In general, wrt the logs, I feel like it's at least a reasonably small
and independent piece of this, though I wonder if it'll cause similar
problems when it comes to dealing with crash recovery (how do we log if
we don't have the key from the vault because we haven't done crash
recovery yet, for example...).

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Thu, Oct 03, 2019 at 11:58:55AM -0400, Stephen Frost wrote:
> >* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> >>On 2019-10-03 16:40, Stephen Frost wrote:
> >>>> As others have said, that sounds wrong to me.  I think you need to
> >>>> encrypt everything.
> >>> That isn't what other database systems do though and isn't what people
> >>> actually asking for this feature are expecting to have or deal with.
> >>
> >>It is what some other database systems do.  Perhaps some others don't.
> >
> >I looked at the contemporary databases and provided details about all of
> >them earlier in the thread.  Please feel free to review that and let me
> >know if your research shows differently.
>
> I assume you mean this (in one of the other threads):
>
> https://www.postgresql.org/message-id/20190817175217.GE16436%40tamriel.snowman.net
>
> FWIW I don't see anything contradicting the idea of just encrypting
> everything (including vm, fsm etc.). The only case that seems to be an
> exception is the column-level encryption in Oracle, all the other
> options (especially the database-level ones) seem to be consistent with
> this principle.

I don't think I was arguing specifically about VM/FSM in particular but
rather about things which, for us, are cluster level.  Admittedly, some
other database systems put more things into tablespaces or databases
than we do (it'd sure be nice if we did in some cases too, but we
don't...), but they do also have things *outside* of those, such that
you can at least bring the system up, to some extent, even if you can't
access a given tablespace or database.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Thu, Oct 3, 2019 at 1:29 PM Stephen Frost <sfrost@snowman.net> wrote:
> I don't think I was arguing specifically about VM/FSM in particular but
> rather about things which, for us, are cluster level.  Admittedly, some
> other database systems put more things into tablespaces or databases
> than we do (it'd sure be nice if we did in some cases too, but we
> don't...), but they do also have things *outside* of those, such that
> you can at least bring the system up, to some extent, even if you can't
> access a given tablespace or database.

It sounds like you're making this up as you go along. The security
ramifications of encrypting a file don't depend on whether that file
is database-level or cluster-level, but rather on whether the contents
could be useful to an attacker. It doesn't seem like it would require
much work at all to construct an argument that a hacker might enjoy
having unfettered access to pg_clog even if no other part of the
database can be read.

My perspective on this feature is, and has always been, that there are
two different things somebody might want, both of which we seem to be
calling "TDE." One is to encrypt every single data page in the cluster
(and possibly things other than data pages, but at least those) with a
single encryption key, much as filesystem encryption would do, but
internal to the database. Contrary to your assertions, such a solution
has useful properties. One is that it will work the same way on any
system where PostgreSQL runs, whereas filesystem encryption solutions
vary. Another is that it does not require the cooperation of the
person who has root in order to set up. A third is that someone with
access to the system does not have automatic and unfettered access to
the database's data; sure, they can get it with enough work, but it's
significantly harder to finish the encryption keys out of the memory
space of a running process than to tar up the data directory that the
filesystem has already decrypted for you. I would personally not care
about any of this based on my own background as somebody who generally
had to do set up systems from scratch, starting with buying the
hardware, but in enterprise and government environments they can pose
significant problems.

The other thing people sometimes want is to encrypt some of the data
within the database but not all of it. In my view, trying to implement
this is not a great idea, because it's vastly more complicated than
just encrypting everything with one key. Would I like to have the
feature? Sure. Do I expect that we're going to get that feature any
time soon? Nope. Even the thing I described in the previous paragraph,
as limited as it is, is complicated and could take several release
cycles to get into committable shape. Fine-grained encryption is
probably an order of magnitude more complicated. The problem of
figuring out which keys apply to which objects does not seem to have
any reasonably simple solution, assuming you want something that's
neither insecure nor a badly-done hack.

I am unsure what the thought process is among people, such as
yourself, who are arguing that fine-grained encryption is the only way
to go. It seems like you're determined to refuse a free Honda Civic on
the grounds that it's not a Cadillac. It's not even like accepting the
patch for the Honda Civic solution would some how block accepting the
Cadillac if that shows up later. It wouldn't. It would just mean that,
unless or until that patch shows up, we'd have something rather than
nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Oct 3, 2019 at 1:29 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I don't think I was arguing specifically about VM/FSM in particular but
> > rather about things which, for us, are cluster level.  Admittedly, some
> > other database systems put more things into tablespaces or databases
> > than we do (it'd sure be nice if we did in some cases too, but we
> > don't...), but they do also have things *outside* of those, such that
> > you can at least bring the system up, to some extent, even if you can't
> > access a given tablespace or database.
>
> It sounds like you're making this up as you go along.

I'm not surprised, and I doubt that's really got much to do with the
actual topic.

> The security
> ramifications of encrypting a file don't depend on whether that file
> is database-level or cluster-level, but rather on whether the contents
> could be useful to an attacker.

I don't believe that I claimed otherwise.  I agree with this.

> It doesn't seem like it would require
> much work at all to construct an argument that a hacker might enjoy
> having unfettered access to pg_clog even if no other part of the
> database can be read.

The question isn't about what hackers would like to have access to, it's
about what would actually provide them with a channel to get information
that's sensitive, and at what rate.  Perhaps there's an argument to be
made that clog would provide a high enough rate of information that
could be used to glean sensitive information, but that's certainly not
an argument that's been put forth, instead it's the knee-jerk reaction
of "oh goodness, if anything isn't encrypted then hackers will be able
to get access to everything" and that's just not a real argument.

> My perspective on this feature is, and has always been, that there are
> two different things somebody might want, both of which we seem to be
> calling "TDE." One is to encrypt every single data page in the cluster
> (and possibly things other than data pages, but at least those) with a
> single encryption key, much as filesystem encryption would do, but
> internal to the database.

Making it all up as I go along notwithstanding, I did go look at other
database systems which I considered on-par with PG, shared that
information here, and am basing my comments on that review.

Which database systems have you looked at which have the properties
you're describing above that we should be working hard towards?

> The other thing people sometimes want is to encrypt some of the data
> within the database but not all of it. In my view, trying to implement
> this is not a great idea, because it's vastly more complicated than
> just encrypting everything with one key.

Which database systems that you'd consider to be on-par with PG, and
which do have TDE, don't have some mechanism for supporting multiple
keys and for encrypting only a subset of the data?

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Magnus Hagander
Date:
On Fri, Oct 4, 2019 at 3:42 AM Stephen Frost <sfrost@snowman.net> wrote:

> It doesn't seem like it would require
> much work at all to construct an argument that a hacker might enjoy
> having unfettered access to pg_clog even if no other part of the
> database can be read.

The question isn't about what hackers would like to have access to, it's
about what would actually provide them with a channel to get information
that's sensitive, and at what rate.  Perhaps there's an argument to be
made that clog would provide a high enough rate of information that
could be used to glean sensitive information, but that's certainly not
an argument that's been put forth, instead it's the knee-jerk reaction
of "oh goodness, if anything isn't encrypted then hackers will be able
to get access to everything" and that's just not a real argument.

Huh. That is *exactly* the argument I made. Though granted the example was on multixact primarily, because I think that is much more likely to leak interesting information, but the basis certainly applies to all the metadata.

--

Re: Transparent Data Encryption (TDE) and encrypted files

From
Magnus Hagander
Date:
On Thu, Oct 3, 2019 at 4:40 PM Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <bruce@momjian.us> wrote:
> > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > overflow).  The plan is:
> >
> >         https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> >
> > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> > other files.  Is that correct?  Do any other PGDATA files contain user
> > data?
>
> As others have said, that sounds wrong to me.  I think you need to
> encrypt everything.

That isn't what other database systems do though and isn't what people
actually asking for this feature are expecting to have or deal with.

Do any of said other database even *have* the equivalence of say pg_clog or pg_multixact *stored outside their tablespaces*? (Because as long as the data is in the tablespace, it's encrypted when using tablespace encryption..)

--

Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Thu, Oct 3, 2019 at 9:42 PM Stephen Frost <sfrost@snowman.net> wrote:
> > It doesn't seem like it would require
> > much work at all to construct an argument that a hacker might enjoy
> > having unfettered access to pg_clog even if no other part of the
> > database can be read.
>
> The question isn't about what hackers would like to have access to, it's
> about what would actually provide them with a channel to get information
> that's sensitive, and at what rate.  Perhaps there's an argument to be
> made that clog would provide a high enough rate of information that
> could be used to glean sensitive information, but that's certainly not
> an argument that's been put forth, instead it's the knee-jerk reaction
> of "oh goodness, if anything isn't encrypted then hackers will be able
> to get access to everything" and that's just not a real argument.

Well, I gather that you didn't much like my characterization of your
argument as "making it up as you go along," which is probably fair,
but I doubt that the people who are arguing that we should encrypt
anything will appreciate your characterization of their argument as
"knee-jerk" any better.

I think everyone would agree that if you have no information about a
database other than the contents of pg_clog, that's not a meaningful
information leak. You would be able to tell which transactions
committed and which transactions aborted, but since you know nothing
about the data inside those transactions, it's of no use to you.
However, in that situation, you probably wouldn't be attacking the
database in the first place. Most likely you have some knowledge about
what it contains. Maybe there's a stream of sensor data that flows
into the database, and you can see that stream.  By watching pg_clog,
you can see when a particular bit of data is rejected. That could be
valuable.

To take a highly artificial example, suppose that the database is fed
by secret video cameras which identify the faces of everyone who
boards a commercial aircraft and records all of those names in a
database, but high-ranking government officials are exempt from the
program and there's a white-list of people whose names can't be
inserted. When the system tries, a constraint violation occurs and the
transaction aborts.  Now, if you see a transaction abort show up in
pg_clog, you know that either a high-ranking government official just
tried to walk onto a plane, or the system is broken.  If you see a
whole bunch of aborts within a few hours of each other, separated by
lots of successful insertions, maybe you can infer a cabinet meeting.
I don't know. That's a little bit of a stretch, but I don't see any
reason why something like that can't happen. There are probably more
plausible examples.

The point is that it's unreasonable, at least in my view, to decide
that the knowledge of which transactions commit and which transactions
abort isn't sensitive. Yeah, on a lot of systems it won't be, but on
some systems it might be, so it should be encrypted.

What I really find puzzling here is that Cybertec had a patch that
encrypted -- well, I don't remember whether it encrypted this, but it
encrypted a lot of stuff, and it spent a lot of time being concerned
about these exact kinds of issues.  I know for example that they
thought about the stats file, which is an even more clear vector for
information leakage than we're talking about here.  They thought about
logical decoding spill files, also a clear vector for information
leakage. Pretty sure they also thought about WAL. That's all really
important stuff, and one thing I learned from reading that patch is
that you can't solve those problems in a trivial, mechanical way. Some
of those systems currently write data byte-by-byte, and converting
them to work block-by-block makes encrypting them a lot easier. So it
seems to me that even if you think that patch had the dumbest key
management system in the history of the universe, you ought to be
embracing some of the ideas that are in that patch because they'll
make any future encryption project easier. Instead of arguing about
whether these side-channel attacks are important -- and I seem not to
be alone here in believing that they are -- we could be working to get
code that has already been written to help solve those problems
committed.

I ask again -- why are you so opposed to a single-key,
encrypt-everything approach? Even if you think multiple-key,
encrypt-only-some-things is better, they don't have to block each
other.

> Which database systems have you looked at which have the properties
> you're describing above that we should be working hard towards?

I haven't studied other database systems much myself. I have, however,
talked with coworkers of mine who are trying to convince people to use
PostgreSQL and/or Advanced Server, and I've heard a lot from them
about what the customers with whom they work would like to see. I base
my comments on those conversations. What I hear from them is basically
that anything we could give them would help. More would be better than
less, of course. People would like a solution with key rotation better
than one without; fine-grained encryption better than coarse-grained
encryption; less performance overhead better than more; and an
encryption algorithm perceived as highly secure better than one
perceived as less secure. But having anything at all would help.

Secondarily, what I hear is that a lot of EnterpriseDB customers or
potential customers reject filesystem encryption not so much because
it's not sufficiently fine-grained, but rather because it depends on
root@localhost. Getting root@localhost to cooperate is difficult and
undesirable, and also filesystem encryption doesn't help at all to
protect against root@localhost. I've pointed out repeatedly to many
people that putting the encryption inside the database doesn't
*really* fix this problem, because root@localhost can ultimately do
anything. But, as I said in my earlier email, people perceive that if
the filesystem does the encryption, root can just cp all the files and
win, whereas if the database does the encryption, that doesn't work,
and root's got to work harder. That seems to matter to a lot of people
who are talking to my colleagues here at EnterpriseDB. That may, of
course, not matter to your users, and that's fine.  I'm not trying to
block people from attacking this problem from other angles; but I *am*
frustrated that you seem to be trying to block what seems to me to be
the most promising angle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:
> I think everyone would agree that if you have no information about a
> database other than the contents of pg_clog, that's not a meaningful
> information leak. You would be able to tell which transactions
> committed and which transactions aborted, but since you know nothing
> about the data inside those transactions, it's of no use to you.
> However, in that situation, you probably wouldn't be attacking the
> database in the first place. Most likely you have some knowledge about
> what it contains. Maybe there's a stream of sensor data that flows
> into the database, and you can see that stream.  By watching pg_clog,
> you can see when a particular bit of data is rejected. That could be
> valuable.

It is certainly true that seeing activity in _any_ cluster file could
leak information.  However, even if we encrypted all the cluster files,
bad actors could still get information by analyzing the file sizes and
size changes of relation files, and the speed of WAL creation, and even
monitor WAL for write activity (WAL file byte changes).  I would think
that would leak more information than clog.

I am not sure how you could secure against that information leak.  While
file system encryption might do that at the storage layer, it doesn't do
anything at the mounted file system layer.

The current approach is to encrypt anything that contains user data,
which includes heap, index, and WAL files.  I think replication slots
and logical replication might also fall into that category, which is why
I started this thread.

I can see some saying that all cluster files should be encrypted, and I
can respect that argument.  However, as outlined in the diagram linked
to from the blog entry:

    https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019

I feel that TDE, since it has limited value, and can't really avoid all
information leakage, should strive to find the intersection of ease of
implementation, security, and compliance.  If people don't think that
limited file encryption is secure, I get it.  However, encrypting most
or all files I think would lead us into such a "difficult to implement"
scope that I would not longer be able to work on this feature.  I think
the code complexity, fragility, potential unreliability, and even
overhead of trying to encrypt most/all files would lead TDE to be
greatly delayed or never implemented.  I just couldn't recommend it. 
Now, I might be totally wrong, and encryption of everything might be
just fine, but I have to pick my projects, and such an undertaking seems
far too risky for me.

Just for some detail, we have solved the block-level encryption problem
by using CTR mode in most cases, but there is still a requirement for a
nonce for every encryption operation.  You can use derived keys too, but
you need to set up those keys for every write to encrypt files.  Maybe
it is possible to set up a write API that handles this transparently in
the code, but I don't know how to do that cleanly, and I doubt if the
value of encrypting everything is worth it.

As far as encrypting the log file, I can see us adding documentation to
warn about that, and even issue a server log message if encryption is
enabled and syslog is not being used.  (I don't know how to test if
syslog is being shipped to a remote server.)

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Fri, Oct 04, 2019 at 07:52:48AM +0200, Magnus Hagander wrote:
>On Fri, Oct 4, 2019 at 3:42 AM Stephen Frost <sfrost@snowman.net> wrote:
>
>>
>> > It doesn't seem like it would require
>> > much work at all to construct an argument that a hacker might enjoy
>> > having unfettered access to pg_clog even if no other part of the
>> > database can be read.
>>
>> The question isn't about what hackers would like to have access to, it's
>> about what would actually provide them with a channel to get information
>> that's sensitive, and at what rate.  Perhaps there's an argument to be
>> made that clog would provide a high enough rate of information that
>> could be used to glean sensitive information, but that's certainly not
>> an argument that's been put forth, instead it's the knee-jerk reaction
>> of "oh goodness, if anything isn't encrypted then hackers will be able
>> to get access to everything" and that's just not a real argument.
>>
>
>Huh. That is *exactly* the argument I made. Though granted the example was
>on multixact primarily, because I think that is much more likely to leak
>interesting information, but the basis certainly applies to all the
>metadata.
>

IMHO we should treat everything as a serious side-channel by default,
and only consider not encrypting it after presenting arguments why
that's not the case. So we shouldn't be starting with unencrypted clog
and waiting for folks to come up with attacks leveraging that.

Of course, it's impossible to prove that something is not a serious
side-channel (it might be safe on it's own, but not necessarily when
combined with other side-channels). And it's not black-and-white, i.e.
the side-channel may be leaking so little information it's not worth
bothering with. And ultimately it's a trade-off between complexity of
implementation and severity of the side-channel.

But without at least trying to quantify the severity of the side-channel
we can't really have a discussion whether it's OK not to encrypt clog,
whether it can be omitted from v1 etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Thu, Oct 03, 2019 at 01:26:55PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:
>> >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> >>On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
>> >>>People who are looking for 'encrypt all the things' should and will be
>> >>>looking at filesytem-level encryption options.  That's not what this
>> >>>feature is about.
>> >>
>> >>That's almost certainly not true, at least not universally.
>> >>
>> >>It may be true for some people, but a a lot of the people asking for
>> >>in-database encryption essentially want to do filesystem encryption but
>> >>can't use it for various reasons. E.g. because they're running in
>> >>environments that make filesystem encryption impossible to use (OS not
>> >>supporting it directly, no access to the block device, lack of admin
>> >>privileges, ...). Or maybe they worry about people with fs access.
>> >
>> >Anyone coming from other database systems isn't asking for that though
>> >and it wouldn't be a comparable offering to other systems.
>>
>> I don't think that's quite accurate. In the previous message you claimed
>> (1) this isn't what other database systems do and (2) people who want to
>> encrypt everything should just use fs encryption, because that's not
>> what TDE is about.
>>
>> Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
>> at least in the mode with tablespace-level encryption. It's true there
>> is also column-level mode, but from my experience it's far less used
>> because it has a number of annoying limitations.
>
>We're probably being too general and that's ending up with us talking
>past each other.  Yes, Oracle provides tablespace and column level
>encryption, but neither case results in *everything* being encrypted.
>

Possibly. There are far too many different TDE definitions in all those
various threads.

>> So I'm somewhat puzzled by your claim that people coming from other
>> systems are asking for the column-level mode. At least I'm assuming
>> that's what they're asking for, because I don't see other options.
>
>I've seen asks for tablespace, table, and column-level, but it's always
>been about the actual data.  Something like clog is an entirely internal
>structure that doesn't include the actual data.  Yes, it's possible it
>could somehow be used for a side-channel attack, as could other things,
>such as WAL, and as such I'm not sure that forcing a policy of "encrypt
>everything" is actually a sensible approach and it definitely adds
>complexity and makes it a lot more difficult to come up with a sensible
>solution.
>

IMHO the proven design principle is "deny all" by default, i.e. we
should start with the assumption that clog is encrypted and then present
arguments why it's not needed. Maybe it's 100% fine and we don't need to
encrypt it, or maybe it's a minor information leak and is not worth the
extra complexity, or maybe it's not needed for v1. But how do you know?
I don't think that discussion happened anywhere in those threads.


>> >>If you look at how the two threads discussing the FDE design, both of
>> >>them pretty much started as "let's do FDE in the database".
>> >
>> >And that's how some folks continue to see it- let's just encrypt all the
>> >things, until they actually look at it and start thinking about what
>> >that means and how to implement it.
>>
>> This argument also works the other way, though. On Oracle, people often
>> start with the column-level encryption because it seems naturally
>> superior (hey, I can encrypt just the columns I want, ...) and then they
>> start running into the various limitations and eventually just switch to
>> the tablespace-level encryption.
>>
>> Now, maybe we'll be able to solve those limitations - but I think it's
>> pretty unlikely, because those limitations seem quite inherent to how
>> encryption affects indexes etc.
>
>It would probably be useful to discuss the specific limitations that
>you've seen causes people to move away from column-level encryption.
>
>I definitely agree that figuring out how to make things work with
>indexes is a non-trivial challenge, though I'm hopeful that we can come
>up with something sensible.
>

Hope is hardly something we should use to drive design decisions ...

As for the limitations, the column-level limitations in Oracle, this is
what the docs [1] say:

----- <quote> -----
Do not use TDE column encryption with the following database features:

    Index types other than B-tree

    Range scan search through an index

    Synchronous change data capture

    Transportable tablespaces

    Columns that have been created as identity columns

In addition, you cannot use TDE column encryption to encrypt columns
used in foreign key constraints.
----- </quote> -----

Now, some of that is obviously specific to Oracle, but at least some of
it seems to affect us too - certainly range scans through indexes,
possibly data capture (I believe that's mostly logical decoding),
non-btree indexes and identity columns.

Oracle also has a handy "TDE best practices" document [2], which says
when to use column-level encryption - let me quote a couple of points:

* Location of sensitive information is known

* Less than 5% of all application columns are encryption candidates

* Encryption candidates are not foreign-key columns

* Indexes over encryption candidates are normal B-tree indexes (this
  also means no support for indexes on expressions, and likely partial
  indexes)

* No support from hardware crypto acceleration.

Now, maybe we can relax some of those limitations, or maybe those
limitations are acceptable for some applications. But it certainly does
not seem like a clearly superior choice.

There are other interesting arguments in that [2], it's worth a read.

>> >Yeah, it'd be great to just encrypt everything, with a bunch of
>> >different keys, all of which are stored somewhere else, and can be
>> >updated and changed by the user when they need to do a rekeying, but
>> >then you start have to asking about what keys need to be available when
>> >for doing crash recovery, how do you handle a crash in the middle of a
>> >rekeying, how do you handle updating keys from the user, etc..
>> >
>> >Sure, we could offer a dead simple "here, use this one key at database
>> >start to just encrypt everything" and that would be enough for some set
>> >of users (a very small set, imv, but that's subjective, obviously), but
>> >I don't think we could dare promote that as having TDE because it
>> >wouldn't be at all comparable to what other databases have, and it
>> >wouldn't materially move us in the direction of having real TDE.
>>
>> I think that very much depends on the definition of what "real TDE".  I
>> don't know what exactly that means at this point. And as I said before,
>> I think such simple mode *is* comparable to (at least some) solutions
>> available in other databases (as explained above).
>
>When I was researching this, I couldn't find any example of a database
>that wouldn't start without the one magic key that encrypts everything.
>I'm happy to be told that I was wrong in my understanding of that, with
>some examples.
>
>> As for the users, I don't have any objective data about this, but I
>> think the amount of people wanting such simple solution is non-trivial.
>> That does not mean we can't extend it to support more advanced features.
>
>The concern that I raised before and that I continue to worry about is
>that providing such a simple capability will have a lot of limitations
>too (such as having a single key and only being able to rekey during a
>complete downtime, because we have to re-encrypt clog, etc, etc), and
>I don't see it helping us get to more granular TDE because, for that,
>where we really need to start is by building a vault of some kind to
>store the keys in and then figuring out how we do things like crash
>recovery in a sensible way and, ideally, without needing to have access
>to all of (any of?) the keys.
>

Eh? I don't think that "simple mode" has to use a single encryption key
internally, I think the design with single *master* key and multiple
encryption keys works just fine. So when changing the master key, it's
enough to re-encrypt the encryption keys. No need for a downtime etc.

Of course, in some cases it may be desirable to change those encryption
keys too, but that seems like a pretty inherent feature.

>> >>>>I'm not sold on the comments that have been made about encrypting the
>> >>>>server log. I agree that could leak data, but that seems like somebody
>> >>>>else's problem: the log files aren't really under PostgreSQL's
>> >>>>management in the same way as pg_clog is. If you want to secure your
>> >>>>logs, send them to syslog and configure it to do whatever you need.
>> >>>
>> >>>I agree with this.
>> >>
>> >>I don't. I know it's not an easy problem to solve, but it may contain
>> >>user data (which is what we manage). We may allow disabling that, at
>> >>which point it becomes someone else's problem.
>> >
>> >We also send user data to clients, but I don't imagine we're suggesting
>> >that we need to control what some downstream application does with that
>> >data or how it gets stored.  There's definitely a lot of room for
>> >improvement in our logging (in an ideal world, we'd have a way to
>> >actually store the logs in the database, at which point it could be
>> >encrypted or not that way...), but I'm not seeing the need for us to
>> >have a way to encrypt the log files.  If we did encrypt them, we'd have
>> >to make sure to do it in a way that users could still access them
>> >without the database being up and running, which might be tricky if the
>> >key is in the vault...
>>
>> That's a bit of a straw-man argument, really. The client is obviously
>> meant to receive and handle sensitive data, that's it's main purpose.
>> For logging systems the situation is a bit different, it's a general
>> purpose tool, with no idea what the data is.
>
>The argument you're making is that the log isn't intended to have
>sensitive data, but while that might be a nice place to get to, we
>certainly aren't there today, which means that people should really be
>sending the logs to a location that's trusted.
>

Which means they can't really send it anywhere, because they don't have
control over what will be in error messages etc.

Let me quote the PCI DSS standard, which seems like a good example:

    3.4 Render Primary Account Number (PAN), at minimum, unreadable
    anywhere it is stored (including data on portable digital media,
    backup media, in logs) by using any of the following approaches:

    * One-way hashes based on strong cryptography

    * Truncation

    * Index tokens and pads (pads must be securely stored)

    * Strong cryptography with associated key management processes and
      procedures.

I'm no PCI DSS expert, but how can you comply with this (assuming you
want tostore PAN in the database) by only sending the data to trusted
systems?

>> I do understand it's pretty pointless to send encrypted message to such
>> external tools, but IMO it's be good to implement that at least for our
>> internal logging collector.
>
>It's also less than user friendly to log to encrypted files that you
>can't read without having the database system being up, so we'd have to
>figure out at least a solution to that problem, and then if you have
>downstream systems where the logs are going to, you have to decrypt
>them, or have a way to have them not be encrypted perhaps.
>

I don't see why the database would have to be up, as long as the vault
is accessible somehow (i.e. I can imagine a tool for reading encrypted
logs, requesting the key from the same vault).

>In general, wrt the logs, I feel like it's at least a reasonably small
>and independent piece of this, though I wonder if it'll cause similar
>problems when it comes to dealing with crash recovery (how do we log if
>we don't have the key from the vault because we haven't done crash
>recovery yet, for example...).
>

Possibly, I don't have an opinion on this.

regards


[1] https://docs.oracle.com/en/database/oracle/oracle-database/18/asoag/configuring-transparent-data-encryption.html

[2] https://www.oracle.com/technetwork/database/security/twp-transparent-data-encryption-bes-130696.pdf

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
> Oracle also has a handy "TDE best practices" document [2], which says
> when to use column-level encryption - let me quote a couple of points:
> 
> * Location of sensitive information is known
> 
> * Less than 5% of all application columns are encryption candidates
> 
> * Encryption candidates are not foreign-key columns
> 
> * Indexes over encryption candidates are normal B-tree indexes (this
>  also means no support for indexes on expressions, and likely partial
>  indexes)
> 
> * No support from hardware crypto acceleration.

Aren't all modern systems going to have hardware crypto acceleration,
i.e., AES-NI CPU extensions.  Does that mean there is no value of
partial encryption on such systems?  Looking at the overhead numbers I
have seen for AES-NI-enabled systems, I believe it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:
>On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:
>> I think everyone would agree that if you have no information about a
>> database other than the contents of pg_clog, that's not a meaningful
>> information leak. You would be able to tell which transactions
>> committed and which transactions aborted, but since you know nothing
>> about the data inside those transactions, it's of no use to you.
>> However, in that situation, you probably wouldn't be attacking the
>> database in the first place. Most likely you have some knowledge about
>> what it contains. Maybe there's a stream of sensor data that flows
>> into the database, and you can see that stream.  By watching pg_clog,
>> you can see when a particular bit of data is rejected. That could be
>> valuable.
>
>It is certainly true that seeing activity in _any_ cluster file could
>leak information.  However, even if we encrypted all the cluster files,
>bad actors could still get information by analyzing the file sizes and
>size changes of relation files, and the speed of WAL creation, and even
>monitor WAL for write activity (WAL file byte changes).  I would think
>that would leak more information than clog.
>

Yes, those information leaks seem unavoidable. 

>I am not sure how you could secure against that information leak.  While
>file system encryption might do that at the storage layer, it doesn't do
>anything at the mounted file system layer.
>

That's because FDE is only meant to protect against passive attacker,
essentially stealing the device. It's useless when someone gains access
to a mounted disk, so these information leaks are irrelevant.

(I'm only talking about encryption at the block device level. I'm not
sure about details e.g. for the encryption built into ext4, etc.)

>The current approach is to encrypt anything that contains user data,
>which includes heap, index, and WAL files.  I think replication slots
>and logical replication might also fall into that category, which is why
>I started this thread.
>

Yes, I think those bits have to be encrypted too.

BTW I'm not sure why you list replication slots and logical replication
independently, those are mostly the same thing I think. For physical
slots we probably don't need to encrypt anything, but for logical slots
we may spill decoded data to files (so those will contain user data).

>I can see some saying that all cluster files should be encrypted, and I
>can respect that argument.  However, as outlined in the diagram linked
>to from the blog entry:
>
>    https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019
>
>I feel that TDE, since it has limited value, and can't really avoid all
>information leakage, should strive to find the intersection of ease of
>implementation, security, and compliance.  If people don't think that
>limited file encryption is secure, I get it.  However, encrypting most
>or all files I think would lead us into such a "difficult to implement"
>scope that I would not longer be able to work on this feature.  I think
>the code complexity, fragility, potential unreliability, and even
>overhead of trying to encrypt most/all files would lead TDE to be
>greatly delayed or never implemented.  I just couldn't recommend it.
>Now, I might be totally wrong, and encryption of everything might be
>just fine, but I have to pick my projects, and such an undertaking seems
>far too risky for me.
>

I agree some trade-offs will be needed, to make the implementation at
all possible (irrespectedly of the exact design). But I think those
trade-offs need to be conscious, based on some technical arguments why
it's OK to consider a particular information leak acceptable, etc. For
example it may be fine when assuming the attacker only gets a single
static copy of the data directory, but not when having the ability to
observe changes made by a running instance.

In a way, my concern is somehat the opposite of yours - that we'll end
up with a feature (which necessarily adds complexity) that however does
not provide sufficient security for various use cases.

And I don't know where exactly the middle ground is, TBH.

>Just for some detail, we have solved the block-level encryption problem
>by using CTR mode in most cases, but there is still a requirement for a
>nonce for every encryption operation.  You can use derived keys too, but
>you need to set up those keys for every write to encrypt files.  Maybe
>it is possible to set up a write API that handles this transparently in
>the code, but I don't know how to do that cleanly, and I doubt if the
>value of encrypting everything is worth it.
>
>As far as encrypting the log file, I can see us adding documentation to
>warn about that, and even issue a server log message if encryption is
>enabled and syslog is not being used.  (I don't know how to test if
>syslog is being shipped to a remote server.)
>

Not sure. I wonder if it's possible to setup syslog so that it encrypts
the data on storage, and if that would be a suitable solution e.g. for
PCI DSS purposes. (It seems at least rsyslogd supports that.)


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:
>On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
>> Oracle also has a handy "TDE best practices" document [2], which says
>> when to use column-level encryption - let me quote a couple of points:
>>
>> * Location of sensitive information is known
>>
>> * Less than 5% of all application columns are encryption candidates
>>
>> * Encryption candidates are not foreign-key columns
>>
>> * Indexes over encryption candidates are normal B-tree indexes (this
>>  also means no support for indexes on expressions, and likely partial
>>  indexes)
>>
>> * No support from hardware crypto acceleration.
>
>Aren't all modern systems going to have hardware crypto acceleration,
>i.e., AES-NI CPU extensions.  Does that mean there is no value of
>partial encryption on such systems?  Looking at the overhead numbers I
>have seen for AES-NI-enabled systems, I believe it.
>


That's a good question, I don't know the answer. You're right most
systems have CPUs with AES-NI these days, and I'm not sure why the
column encryption does not leverage that.

Maybe it's because column encryption has to encrypt/decrypt much smaller
chunks of data, and AES-NI is not efficient for that? I don't know.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Fri, Oct  4, 2019 at 11:31:00PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:
> > The current approach is to encrypt anything that contains user data,
> > which includes heap, index, and WAL files.  I think replication slots
> > and logical replication might also fall into that category, which is why
> > I started this thread.
> 
> Yes, I think those bits have to be encrypted too.
> 
> BTW I'm not sure why you list replication slots and logical replication
> independently, those are mostly the same thing I think. For physical
> slots we probably don't need to encrypt anything, but for logical slots
> we may spill decoded data to files (so those will contain user data).

In this thread, I am really looking for experts who can explain exactly
where sensitive data is stored in PGDATA.  Oh, pgsql_tmp must be
encrypted too.  I would say we know which things must be encrypted, but
we now need to go through the rest of PGDATA to determine which parts
are safe to leave unencrypted, and which must be encrypted.

> > I can see some saying that all cluster files should be encrypted, and I
> > can respect that argument.  However, as outlined in the diagram linked
> > to from the blog entry:
> > 
> >     https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019
> > 
> > I feel that TDE, since it has limited value, and can't really avoid all
> > information leakage, should strive to find the intersection of ease of
> > implementation, security, and compliance.  If people don't think that
> > limited file encryption is secure, I get it.  However, encrypting most
> > or all files I think would lead us into such a "difficult to implement"
> > scope that I would not longer be able to work on this feature.  I think
> > the code complexity, fragility, potential unreliability, and even
> > overhead of trying to encrypt most/all files would lead TDE to be
> > greatly delayed or never implemented.  I just couldn't recommend it.
> > Now, I might be totally wrong, and encryption of everything might be
> > just fine, but I have to pick my projects, and such an undertaking seems
> > far too risky for me.
> > 
> 
> I agree some trade-offs will be needed, to make the implementation at
> all possible (irrespectedly of the exact design). But I think those
> trade-offs need to be conscious, based on some technical arguments why
> it's OK to consider a particular information leak acceptable, etc. For
> example it may be fine when assuming the attacker only gets a single
> static copy of the data directory, but not when having the ability to
> observe changes made by a running instance.

Yes, we need to be explicit in what we don't encrypt --- that it is
reasonably safe.

> In a way, my concern is somehat the opposite of yours - that we'll end
> up with a feature (which necessarily adds complexity) that however does
> not provide sufficient security for various use cases.

Yep, if we can't do it safely, there is no point in doing it.

> And I don't know where exactly the middle ground is, TBH.

We spend a lot of time figuring out exactly how to safely encrypt WAL,
heap, index, and pgsql_tmp files.   The idea of doing this for another
20 types of files --- to find a safe nonce, to be sure a file rewrite
doesn't reuse the nonce, figuring the API, crash recovery, forensics,
tool interface --- is something I would like to avoid.  I want to avoid
it not because I don't like work, but because I am afraid the code
impact and fragility will doom the feature.

> > Just for some detail, we have solved the block-level encryption problem
> > by using CTR mode in most cases, but there is still a requirement for a
> > nonce for every encryption operation.  You can use derived keys too, but
> > you need to set up those keys for every write to encrypt files.  Maybe
> > it is possible to set up a write API that handles this transparently in
> > the code, but I don't know how to do that cleanly, and I doubt if the
> > value of encrypting everything is worth it.
> > 
> > As far as encrypting the log file, I can see us adding documentation to
> > warn about that, and even issue a server log message if encryption is
> > enabled and syslog is not being used.  (I don't know how to test if
> > syslog is being shipped to a remote server.)
> > 
> 
> Not sure. I wonder if it's possible to setup syslog so that it encrypts
> the data on storage, and if that would be a suitable solution e.g. for
> PCI DSS purposes. (It seems at least rsyslogd supports that.)

Well, users don't want the data visible in a mounted file system, which
is why we were thinking a remote secure server would help.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Fri, Oct  4, 2019 at 11:48:19PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:
> > On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
> > > Oracle also has a handy "TDE best practices" document [2], which says
> > > when to use column-level encryption - let me quote a couple of points:
> > > 
> > > * Location of sensitive information is known
> > > 
> > > * Less than 5% of all application columns are encryption candidates
> > > 
> > > * Encryption candidates are not foreign-key columns
> > > 
> > > * Indexes over encryption candidates are normal B-tree indexes (this
> > >  also means no support for indexes on expressions, and likely partial
> > >  indexes)
> > > 
> > > * No support from hardware crypto acceleration.
> > 
> > Aren't all modern systems going to have hardware crypto acceleration,
> > i.e., AES-NI CPU extensions.  Does that mean there is no value of
> > partial encryption on such systems?  Looking at the overhead numbers I
> > have seen for AES-NI-enabled systems, I believe it.
> > 
> 
> 
> That's a good question, I don't know the answer. You're right most
> systems have CPUs with AES-NI these days, and I'm not sure why the
> column encryption does not leverage that.
> 
> Maybe it's because column encryption has to encrypt/decrypt much smaller
> chunks of data, and AES-NI is not efficient for that? I don't know.

For full-cluster TDE with AES-NI-enabled, the performance impact is
usually ~4%, so doing anything more granular doesn't seem useful.  See
this PGCon presentation with charts:

    https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s

Having anthing more fine-grained that all-cluster didn't seem worth it. 
Using per-user keys is useful, but also much harder to implement.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
>On Fri, Oct  4, 2019 at 11:48:19PM +0200, Tomas Vondra wrote:
>> On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:
>> > On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
>> > > Oracle also has a handy "TDE best practices" document [2], which says
>> > > when to use column-level encryption - let me quote a couple of points:
>> > >
>> > > * Location of sensitive information is known
>> > >
>> > > * Less than 5% of all application columns are encryption candidates
>> > >
>> > > * Encryption candidates are not foreign-key columns
>> > >
>> > > * Indexes over encryption candidates are normal B-tree indexes (this
>> > >  also means no support for indexes on expressions, and likely partial
>> > >  indexes)
>> > >
>> > > * No support from hardware crypto acceleration.
>> >
>> > Aren't all modern systems going to have hardware crypto acceleration,
>> > i.e., AES-NI CPU extensions.  Does that mean there is no value of
>> > partial encryption on such systems?  Looking at the overhead numbers I
>> > have seen for AES-NI-enabled systems, I believe it.
>> >
>>
>>
>> That's a good question, I don't know the answer. You're right most
>> systems have CPUs with AES-NI these days, and I'm not sure why the
>> column encryption does not leverage that.
>>
>> Maybe it's because column encryption has to encrypt/decrypt much smaller
>> chunks of data, and AES-NI is not efficient for that? I don't know.
>
>For full-cluster TDE with AES-NI-enabled, the performance impact is
>usually ~4%, so doing anything more granular doesn't seem useful.  See
>this PGCon presentation with charts:
>
>    https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
>
>Having anthing more fine-grained that all-cluster didn't seem worth it.
>Using per-user keys is useful, but also much harder to implement.
>

Not sure I follow. I thought you are asking why Oracle apparently does
not leverage AES-NI for column-level encryption (at least according to
the document I linked)? And I don't know why that's the case.

FWIW performance is just one (supposed) benefit of column encryption,
even if all-cluster encryption is just as fast, there might be other
reasons to support it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
> > For full-cluster TDE with AES-NI-enabled, the performance impact is
> > usually ~4%, so doing anything more granular doesn't seem useful.  See
> > this PGCon presentation with charts:
> > 
> >     https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
> > 
> > Having anthing more fine-grained that all-cluster didn't seem worth it.
> > Using per-user keys is useful, but also much harder to implement.
> > 
> 
> Not sure I follow. I thought you are asking why Oracle apparently does
> not leverage AES-NI for column-level encryption (at least according to
> the document I linked)? And I don't know why that's the case.

No, I read it as Oracle saying that there isn't much value to per-column
encryption if you have crypto hardware acceleration, because the
all-cluster encryption overhead is so minor.

> FWIW performance is just one (supposed) benefit of column encryption,
> even if all-cluster encryption is just as fast, there might be other
> reasons to support it.

Well, there is per-user/db encryption, but I think that needs to be done
at the SQL level.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Fri, Oct 04, 2019 at 08:14:44PM -0400, Bruce Momjian wrote:
>On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
>> On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
>> > For full-cluster TDE with AES-NI-enabled, the performance impact is
>> > usually ~4%, so doing anything more granular doesn't seem useful.  See
>> > this PGCon presentation with charts:
>> >
>> >     https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
>> >
>> > Having anthing more fine-grained that all-cluster didn't seem worth it.
>> > Using per-user keys is useful, but also much harder to implement.
>> >
>>
>> Not sure I follow. I thought you are asking why Oracle apparently does
>> not leverage AES-NI for column-level encryption (at least according to
>> the document I linked)? And I don't know why that's the case.
>
>No, I read it as Oracle saying that there isn't much value to per-column
>encryption if you have crypto hardware acceleration, because the
>all-cluster encryption overhead is so minor.
>

So essentially the argument is - if you have hw crypto acceleration (aka
AES-NI), then the overhead of all-cluster encryption is so low it does
not make sense to bother with lowering it with column encryption.

IMO that's a good argument against column encryption (at least when used
to reduce overhead), although 10% still quite a bit.

But I'm not sure it's what the document is saying. I'm sure if they
could, they'd use AES-NI even for column encryption, to make it more
efficient. Because why wouldn't you do that? But the doc explicitly
says:

    Hardware cryptographic acceleration for TDE column encryption is
    not supported.

So there has to be a reason why that's not supported. Either there's
something that prevents this mode from using AES-NI at all, or it simply
can't be sped-up.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian <bruce@momjian.us> wrote:
> We spend a lot of time figuring out exactly how to safely encrypt WAL,
> heap, index, and pgsql_tmp files.   The idea of doing this for another
> 20 types of files --- to find a safe nonce, to be sure a file rewrite
> doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> tool interface --- is something I would like to avoid.  I want to avoid
> it not because I don't like work, but because I am afraid the code
> impact and fragility will doom the feature.

I'm concerned about that, too, but there's no getting around the fact
that there are a bunch of types of files and that they do all need to
be dealt with. If we have a good scheme for doing that, hopefully
extending it to additional types of files is not that bad, which would
then spare us the trouble of arguing about each one individually, and
also be more secure.

As I also said to Stephen, the people who are discussing this here
should *really really really* be looking at the Cybertec patch instead
of trying to invent everything from scratch - unless that patch has,
like, typhoid, or something, in which case please let me know so that
I, too, can avoid looking at it. Even if you wanted to use 0% of the
code, you could look at the list of file types that they consider
encrypting and think about whether you agree with the decisions they
made. I suspect that you would quickly find that you've left some
things out of your list. In fact, I can think of a couple pretty clear
examples, like the stats files, which clearly contain user data.

Another reason that you should go look at that patch is because it
actually tries to grapple with the exact problem that you're worrying
about in the abstract: there are a LOT of different kinds of files and
they all need to be handled somehow. Even if you can convince yourself
that things like pg_clog don't need encryption, which I think is a
pretty tough sell, there are LOT of file types that directly contain
user data and do need to be handled. A lot of the code that writes
those various types of files is pretty ad-hoc. It doesn't necessarily
do nice things like build up a block of data and then write it out
together; it may for example write a byte a time. That's not going to
work well for encryption, I think, so the Cybertec patch changes that
stuff around. I personally don't think that the patch does that in a
way that is sufficiently clean and carefully considered for it to be
integrated into core, and my plan had been to work on that with the
patch authors.

However, that plan has been somewhat derailed by the fact that we now
have hundreds of emails arguing about the design, because I don't want
to be trying to push water up a hill if everyone else is going in a
different direction. It looks to me, though, like we haven't really
gotten beyond the point where that patch already was. The issues of
nonce and many file types have already been thought about carefully
there. I rather suspect that they did not get it all right. But, it
seems to me that it would be a lot more useful to look at the code
actually written and think about what it gets right and wrong than to
discuss these points as a strictly theoretical matter.

In other words: maybe I'm wrong here, but it looks to me like we're
laboriously reinventing the wheel when we could be working on
improving the working prototype.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Sat, Oct  5, 2019 at 09:13:59PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 08:14:44PM -0400, Bruce Momjian wrote:
> > On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
> > > On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
> > > > For full-cluster TDE with AES-NI-enabled, the performance impact is
> > > > usually ~4%, so doing anything more granular doesn't seem useful.  See
> > > > this PGCon presentation with charts:
> > > >
> > > >     https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
> > > >
> > > > Having anthing more fine-grained that all-cluster didn't seem worth it.
> > > > Using per-user keys is useful, but also much harder to implement.
> > > >
> > > 
> > > Not sure I follow. I thought you are asking why Oracle apparently does
> > > not leverage AES-NI for column-level encryption (at least according to
> > > the document I linked)? And I don't know why that's the case.
> > 
> > No, I read it as Oracle saying that there isn't much value to per-column
> > encryption if you have crypto hardware acceleration, because the
> > all-cluster encryption overhead is so minor.
> > 
> 
> So essentially the argument is - if you have hw crypto acceleration (aka
> AES-NI), then the overhead of all-cluster encryption is so low it does
> not make sense to bother with lowering it with column encryption.

Yes, I think that is true.  Column-level encryption can be useful in
giving different people control of the keys, but I think that feature
should be developed at the SQL level so clients can unlock the key and
backups include the encryption keys.

> IMO that's a good argument against column encryption (at least when used
> to reduce overhead), although 10% still quite a bit.

I think that test was a worst-case one and I think it needs to be
optimized before we draw any conclusions.

> But I'm not sure it's what the document is saying. I'm sure if they
> could, they'd use AES-NI even for column encryption, to make it more
> efficient. Because why wouldn't you do that? But the doc explicitly
> says:
> 
>    Hardware cryptographic acceleration for TDE column encryption is
>    not supported.

Oh, wow, that is something!

> So there has to be a reason why that's not supported. Either there's
> something that prevents this mode from using AES-NI at all, or it simply
> can't be sped-up.

Yeah, good question.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Oct  7, 2019 at 09:44:30AM -0400, Robert Haas wrote:
> On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian <bruce@momjian.us> wrote:
> > We spend a lot of time figuring out exactly how to safely encrypt WAL,
> > heap, index, and pgsql_tmp files.   The idea of doing this for another
> > 20 types of files --- to find a safe nonce, to be sure a file rewrite
> > doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> > tool interface --- is something I would like to avoid.  I want to avoid
> > it not because I don't like work, but because I am afraid the code
> > impact and fragility will doom the feature.
> 
> I'm concerned about that, too, but there's no getting around the fact
> that there are a bunch of types of files and that they do all need to
> be dealt with. If we have a good scheme for doing that, hopefully
> extending it to additional types of files is not that bad, which would
> then spare us the trouble of arguing about each one individually, and
> also be more secure.

Well, do to encryption properly, there is the requirement of the nonce. 
If you ever rewrite a bit, you technically have to have a new nonce. 
For WAL, since it is append-only, you can use the WAL file name.  For
heap/index files, we change the LSN on every rewrite (with
wal_log_hints=on), and we never use the same LSN for writing multiple
relations, so LSN+page-offset is a sufficient nonce.

For clog, it is not append-only, and bytes are rewritten (from zero to
non-zero), so there would have to be a new nonce for every clog file
write to the file system.  We can store the nonce in a separate file,
but the clog contents and nonce would have to be always synchronized or
the file could not be properly read.  Basically every file we want to
encrypt, needs this kind of study.

> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch - unless that patch has,

Someone from Cybertec is on the voice calls we have, and is actively
involved.

> like, typhoid, or something, in which case please let me know so that
> I, too, can avoid looking at it. Even if you wanted to use 0% of the
> code, you could look at the list of file types that they consider
> encrypting and think about whether you agree with the decisions they
> made. I suspect that you would quickly find that you've left some
> things out of your list. In fact, I can think of a couple pretty clear
> examples, like the stats files, which clearly contain user data.

I am asking here because I don't think the Cybertec approach has gotten
enough study compared to what this group can contribute.

> Another reason that you should go look at that patch is because it
> actually tries to grapple with the exact problem that you're worrying
> about in the abstract: there are a LOT of different kinds of files and
> they all need to be handled somehow. Even if you can convince yourself
> that things like pg_clog don't need encryption, which I think is a
> pretty tough sell, there are LOT of file types that directly contain
> user data and do need to be handled. A lot of the code that writes
> those various types of files is pretty ad-hoc. It doesn't necessarily
> do nice things like build up a block of data and then write it out
> together; it may for example write a byte a time. That's not going to
> work well for encryption, I think, so the Cybertec patch changes that

Actually, byte-at-a-time works fine with CTR mode, though that mode is
very sensitive to the reuse of the nonce since the user data is not part
of the input for future encryption blocks.

> stuff around. I personally don't think that the patch does that in a
> way that is sufficiently clean and carefully considered for it to be
> integrated into core, and my plan had been to work on that with the
> patch authors.
> 
> However, that plan has been somewhat derailed by the fact that we now
> have hundreds of emails arguing about the design, because I don't want
> to be trying to push water up a hill if everyone else is going in a
> different direction. It looks to me, though, like we haven't really
> gotten beyond the point where that patch already was. The issues of
> nonce and many file types have already been thought about carefully
> there. I rather suspect that they did not get it all right. But, it
> seems to me that it would be a lot more useful to look at the code
> actually written and think about what it gets right and wrong than to
> discuss these points as a strictly theoretical matter.
> 
> In other words: maybe I'm wrong here, but it looks to me like we're
> laboriously reinventing the wheel when we could be working on
> improving the working prototype.

The work being done is building on that prototype.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian <bruce@momjian.us> wrote:
> For clog, it is not append-only, and bytes are rewritten (from zero to
> non-zero), so there would have to be a new nonce for every clog file
> write to the file system.  We can store the nonce in a separate file,
> but the clog contents and nonce would have to be always synchronized or
> the file could not be properly read.  Basically every file we want to
> encrypt, needs this kind of study.

Yeah. It's a big problem/project.

Another approach to this problem would be to adjust the block format
to leave room for the nonce. If encryption is not in use, then those
bytes would just be zeroed or something. That would make upgrading a
bit tricky, but pg_upgrade could be taught to do the necessary
conversions for SLRUs without too much pain, I think.

In my opinion, it is desirable to maintain as much consistency as
possible between what we store on disk in the encrypted case and what
we store on disk in the not-encrypted case.  If we have to add
additional forks in the encrypted case, or change the file of the
format and not just the contents, it seems likely to add complexity
and bugs that we might be able to avoid via another approach.

> > In other words: maybe I'm wrong here, but it looks to me like we're
> > laboriously reinventing the wheel when we could be working on
> > improving the working prototype.
>
> The work being done is building on that prototype.

That's good, but then I'm puzzled as to why your list of things to
encrypt doesn't include all the things it already covers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Oct  7, 2019 at 11:26:24AM -0400, Robert Haas wrote:
> On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian <bruce@momjian.us> wrote:
> > For clog, it is not append-only, and bytes are rewritten (from zero to
> > non-zero), so there would have to be a new nonce for every clog file
> > write to the file system.  We can store the nonce in a separate file,
> > but the clog contents and nonce would have to be always synchronized or
> > the file could not be properly read.  Basically every file we want to
> > encrypt, needs this kind of study.
> 
> Yeah. It's a big problem/project.
> 
> Another approach to this problem would be to adjust the block format
> to leave room for the nonce. If encryption is not in use, then those
> bytes would just be zeroed or something. That would make upgrading a
> bit tricky, but pg_upgrade could be taught to do the necessary
> conversions for SLRUs without too much pain, I think.

Yes, that is exactly the complexity we have deal with, both in terms of
code complexity, reliability, and future maintenance.  Currently the
file format is unchanged, but as we add more encrypted files, we might
need to change it.  Fortunately, I think heap/index files don't need to
change, so pg_upgrade will not require changes.

> In my opinion, it is desirable to maintain as much consistency as
> possible between what we store on disk in the encrypted case and what
> we store on disk in the not-encrypted case.  If we have to add
> additional forks in the encrypted case, or change the file of the
> format and not just the contents, it seems likely to add complexity
> and bugs that we might be able to avoid via another approach.

Agreed.

> > > In other words: maybe I'm wrong here, but it looks to me like we're
> > > laboriously reinventing the wheel when we could be working on
> > > improving the working prototype.
> >
> > The work being done is building on that prototype.
> 
> That's good, but then I'm puzzled as to why your list of things to
> encrypt doesn't include all the things it already covers.

Well, I am starting with the things I _know_ need encrypting, and am
then waiting for others to tell me what to add.   Cybertec has not
provided a list and reasons yet, that I have seen.  This is why I
started this public thread, so we could get a list and agree on it.

FYI, I realize this is all very complex, and requires cryptography and
server internals knowledge.  I am happy to discuss it via voice with
anyone.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> Well, I am starting with the things I _know_ need encrypting, and am
> then waiting for others to tell me what to add.   Cybertec has not
> provided a list and reasons yet, that I have seen.  This is why I
> started this public thread, so we could get a list and agree on it.

Well that's fine, but you could also open up the patch and have a look
at it. Even if you just looked at which files it modifies, it would
enable you to add some important things do your list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Oct  7, 2019 at 12:30:37PM -0400, Robert Haas wrote:
> On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > Well, I am starting with the things I _know_ need encrypting, and am
> > then waiting for others to tell me what to add.   Cybertec has not
> > provided a list and reasons yet, that I have seen.  This is why I
> > started this public thread, so we could get a list and agree on it.
> 
> Well that's fine, but you could also open up the patch and have a look
> at it. Even if you just looked at which files it modifies, it would
> enable you to add some important things do your list.

Uh, I am really then just importing what one group decided, which seems
unsafe.  I think it needs a fresh look at all files.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Mon, Oct 7, 2019 at 12:34 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Oct  7, 2019 at 12:30:37PM -0400, Robert Haas wrote:
> > On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > Well, I am starting with the things I _know_ need encrypting, and am
> > > then waiting for others to tell me what to add.   Cybertec has not
> > > provided a list and reasons yet, that I have seen.  This is why I
> > > started this public thread, so we could get a list and agree on it.
> >
> > Well that's fine, but you could also open up the patch and have a look
> > at it. Even if you just looked at which files it modifies, it would
> > enable you to add some important things do your list.
>
> Uh, I am really then just importing what one group decided, which seems
> unsafe.  I think it needs a fresh look at all files.

A fresh look at all files is a good idea, but that doesn't making
looking at the work other people have already done a bad idea.

I don't understand the theory that it's useful to have multiple
100+-message email threads about what we ought to do, but that looking
at the already-written code is not useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Magnus Hagander
Date:
On Mon, Oct 7, 2019 at 5:48 PM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Oct  7, 2019 at 11:26:24AM -0400, Robert Haas wrote:
> On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian <bruce@momjian.us> wrote:
> > For clog, it is not append-only, and bytes are rewritten (from zero to
> > non-zero), so there would have to be a new nonce for every clog file
> > write to the file system.  We can store the nonce in a separate file,
> > but the clog contents and nonce would have to be always synchronized or
> > the file could not be properly read.  Basically every file we want to
> > encrypt, needs this kind of study.
>
> Yeah. It's a big problem/project.
>
> Another approach to this problem would be to adjust the block format
> to leave room for the nonce. If encryption is not in use, then those
> bytes would just be zeroed or something. That would make upgrading a
> bit tricky, but pg_upgrade could be taught to do the necessary
> conversions for SLRUs without too much pain, I think.

Yes, that is exactly the complexity we have deal with, both in terms of
code complexity, reliability, and future maintenance.  Currently the
file format is unchanged, but as we add more encrypted files, we might
need to change it.  Fortunately, I think heap/index files don't need to
change, so pg_upgrade will not require changes.

It does sound very similar to the problem of being able to add checksums to the clog files (and other SLRUs). So if that can get done, it would help both of those cases (if done right).

--

Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian <bruce@momjian.us> wrote:
> > We spend a lot of time figuring out exactly how to safely encrypt WAL,
> > heap, index, and pgsql_tmp files.   The idea of doing this for another
> > 20 types of files --- to find a safe nonce, to be sure a file rewrite
> > doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> > tool interface --- is something I would like to avoid.  I want to avoid
> > it not because I don't like work, but because I am afraid the code
> > impact and fragility will doom the feature.
> 
> I'm concerned about that, too, but there's no getting around the fact
> that there are a bunch of types of files and that they do all need to
> be dealt with. If we have a good scheme for doing that, hopefully
> extending it to additional types of files is not that bad, which would
> then spare us the trouble of arguing about each one individually, and
> also be more secure.
> 
> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch -

Maybe it's enough to check the README.encryption file that [1] contains. Or
should I publish this (in shorter form) on the wiki [2] ?

> In fact, I can think of a couple pretty clear examples, like the stats
> files, which clearly contain user data.

Specifically this part was removed because I expected that [3] will be
committed earlier than the encryption. This expectation still seems to be
valid.

The thread on encryption was very alive when I was working on the last version
of our patch, so it was hard to participate in the discussion. I tried to
catch up later, and I think I could understand most of the problems. It became
clear that it's better to collaborate then to incorporate the new ideas into
[1]. I proposed to Masahiko Sawada that we're ready to collaborate on coding
and he agreed. However the design doesn't seem to be stable enough at the
moment for coding to make sense.

As for the design, I spent some time thinking about it, especially on the
per-table/tablespace keys (recovery issues etc.), but haven't invented
anything new. If there's anything useful I can do about the feature, I'll be
glad to help.

[1] https://commitfest.postgresql.org/25/2104/

[2] https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

[3] https://commitfest.postgresql.org/25/1708/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Tomas Vondra
Date:
On Mon, Oct 07, 2019 at 10:22:22AM -0400, Bruce Momjian wrote:
>On Sat, Oct  5, 2019 at 09:13:59PM +0200, Tomas Vondra wrote:
>> On Fri, Oct 04, 2019 at 08:14:44PM -0400, Bruce Momjian wrote:
>> > On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
>> > > On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
>> > > > For full-cluster TDE with AES-NI-enabled, the performance impact is
>> > > > usually ~4%, so doing anything more granular doesn't seem useful.  See
>> > > > this PGCon presentation with charts:
>> > > >
>> > > >     https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
>> > > >
>> > > > Having anthing more fine-grained that all-cluster didn't seem worth it.
>> > > > Using per-user keys is useful, but also much harder to implement.
>> > > >
>> > >
>> > > Not sure I follow. I thought you are asking why Oracle apparently does
>> > > not leverage AES-NI for column-level encryption (at least according to
>> > > the document I linked)? And I don't know why that's the case.
>> >
>> > No, I read it as Oracle saying that there isn't much value to per-column
>> > encryption if you have crypto hardware acceleration, because the
>> > all-cluster encryption overhead is so minor.
>> >
>>
>> So essentially the argument is - if you have hw crypto acceleration (aka
>> AES-NI), then the overhead of all-cluster encryption is so low it does
>> not make sense to bother with lowering it with column encryption.
>
>Yes, I think that is true.  Column-level encryption can be useful in
>giving different people control of the keys, but I think that feature
>should be developed at the SQL level so clients can unlock the key and
>backups include the encryption keys.
>

FWIW that's not how the column encryption (at least in Oracle works). It
uses the same encryption keys (with 2-tier key architecture), and the
keys are stored in a wallet. The user only supplies a passphrase (well,
a DBA does that, because it happens only once after the instance starts).

Not sure what exactly you mean by "SQL level" but I agree it's clearly
much higher up the stack than encryption at the block level.

>> IMO that's a good argument against column encryption (at least when used
>> to reduce overhead), although 10% still quite a bit.
>
>I think that test was a worst-case one and I think it needs to be
>optimized before we draw any conclusions.
>

What test? I was really referring to the PDF, which talks about 10%
threshold for the tablespace encryption. And in another section it says

  Internal benchmark tests and customers reported a performance impact of 4
  to 8% in end-user response time, and an increase of 1 to 5% in CPU usage.

Of course, this is not on PostgreSQL, but I'd expect to have comparable
overhead, despite architectural differences. Ultimately, even if it's 15
or 20%, the general rule is likely to remain the same, i.e. column
encryption has significantly higher overhead, and can only beat
tablespace encryption when very small fraction of columns is encrypted.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> However the design doesn't seem to be stable enough at the
> moment for coding to make sense.

Well, I think the question is whether working further on your patch
could produce some things that everyone would agree are a step
forward.  If every iota of that patch is garbage dredged up from the
depths of the Mos Eisley sewers, then let's forget about it, but I
don't think that's the case. As I said on the thread about that patch,
and have also said here, what I learned from looking at that patch is
that the system probably needs some significant restructuring before
there's any hope of incorporating encryption in a reasonably-sized,
reasonably clean patch.  For example, some files need to be written a
block at a time instead of a character at a time. The idea just
discussed -- changing the CLOG page format to leave room for the
encryption nonce and a checksum -- also fall into that category. I
think there are probably a number of others.

No matter what anybody thinks about whether we should have one key,
multiple keys, passwords inside the database, passwords outside the
database, whatever ... that kind of restructuring work has got to be
done first. And it seems like by having all this discussion about the
design, we're basically getting to a situation where we're making no
progress on that stuff. So that's bad. There's nothing *wrong* with
talking about how many keys we had and how key management ought to
work and where passwords should be stored, and we need to make sure
that whatever we do initially doesn't close the door to doing more and
better things later. But, if those discussions have the effect of
blocking work on the basic infrastructure tasks that need to be done,
that's actually counterproductive at this stage.

We should all put our heads together and agree that however we think
key management ought to be handled, it'll be a lot easier to get our
preferred form of key management into PostgreSQL if, while that
discussion rages on, we knocked down some of the infrastructure
problems that *absolutely any patch* for this kind of feature is
certain to face.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Oct  7, 2019 at 09:40:22PM +0200, Tomas Vondra wrote:
> On Mon, Oct 07, 2019 at 10:22:22AM -0400, Bruce Momjian wrote:
> > > So essentially the argument is - if you have hw crypto acceleration (aka
> > > AES-NI), then the overhead of all-cluster encryption is so low it does
> > > not make sense to bother with lowering it with column encryption.
> > 
> > Yes, I think that is true.  Column-level encryption can be useful in
> > giving different people control of the keys, but I think that feature
> > should be developed at the SQL level so clients can unlock the key and
> > backups include the encryption keys.
> > 
> 
> FWIW that's not how the column encryption (at least in Oracle works). It
> uses the same encryption keys (with 2-tier key architecture), and the
> keys are stored in a wallet. The user only supplies a passphrase (well,
> a DBA does that, because it happens only once after the instance starts).
> 
> Not sure what exactly you mean by "SQL level" but I agree it's clearly
> much higher up the stack than encryption at the block level.

Right, what I was saying is that column encryption where they keys are
unlocked by the administrator are really only useful to reduce
encryption overhead, and I think we will find it just isn't worth the
API complexity to allow that.

Per-user keys are useful for cases beyond performance, but require
SQL-level control.

> > > IMO that's a good argument against column encryption (at least when used
> > > to reduce overhead), although 10% still quite a bit.
> > 
> > I think that test was a worst-case one and I think it needs to be
> > optimized before we draw any conclusions.
> 
> What test? I was really referring to the PDF, which talks about 10%
> threshold for the tablespace encryption. And in another section it says
> 
>  Internal benchmark tests and customers reported a performance impact of 4
>  to 8% in end-user response time, and an increase of 1 to 5% in CPU usage.
> 
> Of course, this is not on PostgreSQL, but I'd expect to have comparable
> overhead, despite architectural differences. Ultimately, even if it's 15
> or 20%, the general rule is likely to remain the same, i.e. column
> encryption has significantly higher overhead, and can only beat
> tablespace encryption when very small fraction of columns is encrypted.

Right, and I doubt it will be worth it, but I think we need to complete
all-cluster encryption and then run some tests so see what the overhead
is.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Ants Aasma
Date:
On Mon, 7 Oct 2019 at 18:02, Bruce Momjian <bruce@momjian.us> wrote:
Well, do to encryption properly, there is the requirement of the nonce.
If you ever rewrite a bit, you technically have to have a new nonce.
For WAL, since it is append-only, you can use the WAL file name.  For
heap/index files, we change the LSN on every rewrite (with
wal_log_hints=on), and we never use the same LSN for writing multiple
relations, so LSN+page-offset is a sufficient nonce.

For clog, it is not append-only, and bytes are rewritten (from zero to
non-zero), so there would have to be a new nonce for every clog file
write to the file system.  We can store the nonce in a separate file,
but the clog contents and nonce would have to be always synchronized or
the file could not be properly read.  Basically every file we want to
encrypt, needs this kind of study.
 
Yes. That is the reason why our current version doesn't encrypt SLRU's. There is some security in encrypting without a nonce when considering an attack vector that only sees one version of the encrypted page. But I think to make headway on this we need to figure out if TDE feature is useful withour SLRU encryption (I think yes), and how hard would it be to properly encrypt SLRU's? Would the solution be acceptable for inclusion?

I can think of 3 options:

a) A separate nonce storage. Seems pretty bad complexity wise. New data-structures would need to be created. SLRU writes would need to be WAL logged with a full page image.
b) Inline nonces, number of items per SLRU page is variable depending on if encryption is enabled or not.
c) Inline nonces we reserve a header structure on all SLRU pages. pg_upgrade needs to rewrite persistent SLRUs.

None of the options seem great, but c) has the benefit of also carving out the space for SLRU checksums.

> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch - unless that patch has,

Someone from Cybertec is on the voice calls we have, and is actively
involved.

As far as I can tell no-one from us is on the call. I personally missed the invitation when it was sent out. I would gladly share our learnings, a lot of what I see here is retreading what we already went through with our patch. However, I think that at the very least the conclusions, problems to work on and WIP patch should be shared on list. It's hard for anybody outside to have any input if there are no concrete design proposals or code to review. Moreover, I think e-mail is a much better media for having a reasoned discussion about technical design decisions.
 
> In other words: maybe I'm wrong here, but it looks to me like we're
> laboriously reinventing the wheel when we could be working on
> improving the working prototype.

The work being done is building on that prototype.

We would like to help on that front.

Regards,
Ants Aasma 

Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Ants Aasma <ants@cybertec.at> wrote:

> On Mon, 7 Oct 2019 at 18:02, Bruce Momjian <bruce@momjian.us> wrote:
> 
>>  Well, do to encryption properly, there is the requirement of the nonce. 
>>  If you ever rewrite a bit, you technically have to have a new nonce. 
>>  For WAL, since it is append-only, you can use the WAL file name.  For
>>  heap/index files, we change the LSN on every rewrite (with
>>  wal_log_hints=on), and we never use the same LSN for writing multiple
>>  relations, so LSN+page-offset is a sufficient nonce.
>> 
>>  For clog, it is not append-only, and bytes are rewritten (from zero to
>>  non-zero), so there would have to be a new nonce for every clog file
>>  write to the file system.  We can store the nonce in a separate file,
>>  but the clog contents and nonce would have to be always synchronized or
>>  the file could not be properly read.  Basically every file we want to
>>  encrypt, needs this kind of study.

> Yes. That is the reason why our current version doesn't encrypt
> SLRU's.

Actually there was one more problem: the AES-CBC cipher (or AES-XTS in the
earlier patch version) process an encryption block of 16 bytes at a time. Thus
if only a part of the block gets written (a torn page write), decryption of
the block results in garbage. Unlike relations, there's nothing like full-page
write for SLRU pages, so there's no way to recover from this problem.

However, if the current plan is to use the CTR mode, this problem should not
happen.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> > However the design doesn't seem to be stable enough at the
> > moment for coding to make sense.
>
> Well, I think the question is whether working further on your patch
> could produce some things that everyone would agree are a step
> forward.

It would have made a lot of sense several months ago (Masahiko Sawada actually
used parts of our patch in the previous version of his patch (see [1]), but
the requirement to use a different IV for each execution of the encryption
changes things quite a bit.

Besides the relation pages and SLRU (CLOG), which are already being discussed
elsewhere in the thread, let's consider other two file types:

* Temporary files (buffile.c): we derive the IV from PID of the process that
  created the file + segment number + block within the segment. This
  information does not change if you need to write the same block again. If
  new IV should be used for each encryption run, we can simply introduce an
  in-memory counter that generates the IV for each block. However it becomes
  trickier if the temporary file is shared by multiple backends. I think it
  might still be easier to expose the IV values to other backends via shared
  memory than to store them on disk ...

* "Buffered transient file". This is to be used instead of OpenTransientFile()
  if user needs the option to encrypt the file. (Our patch adds this API to
  buffile.c. Currently we use it in reorderbuffer.c to encrypt the data
  changes produced by logical decoding, but there should be more use cases.)

  In this case we cannot keep the IVs in memory because user can close the
  file anytime and open it much later. So we derive the IV by hashing the file
  path. However if we should generate the IV again and again, we need to store
  it on disk in another way, probably one IV value per block (PGAlignedBlock).

  However since our implementation of both these file types shares some code,
  it might yet be easier if the shared temporary file also stored the IV on
  disk instead of exposing it via shared memory ...

Perhaps this is what I can work on, but I definitely need some feedback.

[1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> Unless we are *absolutely* certain, I bet someone will be able to find a
> side-channel that somehow leaks some data or data-about-data, if we don't
> encrypt everything. If nothing else, you can get use patterns out of it,
> and you can make a lot from that. (E.g. by whether transactions are using
> multixacts or not you can potentially determine which transaction they are,
> if you know what type of transactions are being issued by the application.
> In the simplest case, there might be a single pattern where multixacts end
> up actually being used, and in that case being able to see the multixact
> data tells you a lot about the system).

Thanks for bringing up the concern but this still doesn't strike me, at
least, as being a huge gaping hole that people will have large issues
with.  In other words, I don't agree that this is a high bandwidth side
channel and I don't think that it, alone, brings up a strong need to
encrypt clog and multixact.

> As for other things -- by default, we store the log files in text format in
> the data directory. That contains *loads* of sensitive data in a lot of
> cases. Will those also be encrypted?

imv, this is a largely independent thing, as I said elsewhere, and has
its own set of challenges and considerations to deal with.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Robert Haas
Date:
On Tue, Oct 8, 2019 at 7:52 AM Antonin Houska <ah@cybertec.at> wrote:
> * Temporary files (buffile.c): we derive the IV from PID of the process that
>   created the file + segment number + block within the segment. This
>   information does not change if you need to write the same block again. If
>   new IV should be used for each encryption run, we can simply introduce an
>   in-memory counter that generates the IV for each block. However it becomes
>   trickier if the temporary file is shared by multiple backends. I think it
>   might still be easier to expose the IV values to other backends via shared
>   memory than to store them on disk ...
>
> * "Buffered transient file". This is to be used instead of OpenTransientFile()
>   if user needs the option to encrypt the file. (Our patch adds this API to
>   buffile.c. Currently we use it in reorderbuffer.c to encrypt the data
>   changes produced by logical decoding, but there should be more use cases.)
>
>   In this case we cannot keep the IVs in memory because user can close the
>   file anytime and open it much later. So we derive the IV by hashing the file
>   path. However if we should generate the IV again and again, we need to store
>   it on disk in another way, probably one IV value per block (PGAlignedBlock).
>
>   However since our implementation of both these file types shares some code,
>   it might yet be easier if the shared temporary file also stored the IV on
>   disk instead of exposing it via shared memory ...
>
> Perhaps this is what I can work on, but I definitely need some feedback.

I think this would be a valuable thing upon which to work. I'm not
sure exactly what the right solution is, but it seems to me that it
would be a good thing if we tried to reuse the same solution in as
many places as possible. I don't know if it's realistic to use the
same method for storing IVs for temporary/transient files as we do for
SLRUs, but it would be nice if it were.

I think that one problem with trying to store the data in memory is
that these files get big enough that N bytes/block could still be
pretty big. For instance, if you're sorting 100GB of data with 8GB of
work_mem, you'll need to write 13 tapes and then merge them. Supposing
an IV of 12 bytes/block, the IV vector for each 8GB tape will be 12MB,
so once you've written all 12 types and are ready to merge them,
you're going to have 156MB of IV data floating around. If you keep it
in memory, it ought to count against your work_mem budget, and while
it's not a big fraction of your available memory, it's also not
negligible.  Worse (but less realistic) cases can also be constructed.
To avoid this kind of problem, you could write the IV data to disk.
But notice that tuplesort.c goes to a lot of work to make I/O
sequential, and that helps performance.  If you have to intersperse
reads of separate IV files with the reads of the main data files,
you're going to degrade the I/O pattern. It would really be best if
the IVs were in line with the data itself, I think. (The same probably
applies, and for not unrelated reasons, to SLRU data, if we're going
to try to encrypt that.)

Now, if you could store some kind of an IV "seed" where we only need
one per buffile rather than one per block, then that'd probably be
fine to story in memory. But I don't see how that would work given
that we can overwrite already-written blocks and need a new IV if we
do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transparent Data Encryption (TDE) and encrypted files

From
"Moon, Insung"
Date:
Hello.

On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
> > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> > > However the design doesn't seem to be stable enough at the
> > > moment for coding to make sense.
> >
> > Well, I think the question is whether working further on your patch
> > could produce some things that everyone would agree are a step
> > forward.
>
> It would have made a lot of sense several months ago (Masahiko Sawada actually
> used parts of our patch in the previous version of his patch (see [1]), but
> the requirement to use a different IV for each execution of the encryption
> changes things quite a bit.
>
> Besides the relation pages and SLRU (CLOG), which are already being discussed
> elsewhere in the thread, let's consider other two file types:
>
> * Temporary files (buffile.c): we derive the IV from PID of the process that
>   created the file + segment number + block within the segment. This
>   information does not change if you need to write the same block again. If
>   new IV should be used for each encryption run, we can simply introduce an
>   in-memory counter that generates the IV for each block. However it becomes
>   trickier if the temporary file is shared by multiple backends. I think it
>   might still be easier to expose the IV values to other backends via shared
>   memory than to store them on disk ...

I think encrypt a temporary file in a slightly different way.
Previously, I had a lot of trouble with IV uniqueness, but I have
proposed a unique encryption key for each file.

First, in the case of the CTR mode to be used, 32 bits are used for
the counter in the 128-bit nonce value.
Here, the counter increases every time 16 bytes are encrypted, and
theoretically, if nonce 96 bits are the same, a total of 64 GiB can be
encrypted.

Therefore, in the case of buffile.c that creates a temporary file due
to lack of work_mem, it is possible to use up to 1GiB per file, so it
is possible to encrypt to a simple IV value sufficiently safely.
The problem is that a vulnerability occurs when 96-bit nonce values
excluding Counter are the same values.

I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
first, but in the worst-case PID and tempCounter are used in the same
values.
Therefore, the uniqueness of the encryption key was considered without
considering the uniqueness of the IV value.

The encryption key uses a separate key for each file, as described earlier.
First, it generates a hash value randomly for the file, and uses the
hash value and KEK (or MDEK) to derive and use the key with
HMAC-SHA256.
In this case, there is no need to store the encryption key separately
if it is not necessary to keep it in a separate IV file or memory.
(IV is a hash value of 64 bits and a counter of 32 bits.)

Also, currently, the temporary file name is specified by the current
PID.tempFileCounter, but if this is set to
PID.tempFileCounter.hashvalue, we can encrypt and decrypt in any
process thinking about.

Reference URL
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption


>
> * "Buffered transient file". This is to be used instead of OpenTransientFile()
>   if user needs the option to encrypt the file. (Our patch adds this API to
>   buffile.c. Currently we use it in reorderbuffer.c to encrypt the data
>   changes produced by logical decoding, but there should be more use cases.)

Agreed.

Best regards.
Moon.

>
>   In this case we cannot keep the IVs in memory because user can close the
>   file anytime and open it much later. So we derive the IV by hashing the file
>   path. However if we should generate the IV again and again, we need to store
>   it on disk in another way, probably one IV value per block (PGAlignedBlock).
>
>   However since our implementation of both these file types shares some code,
>   it might yet be easier if the shared temporary file also stored the IV on
>   disk instead of exposing it via shared memory ...
>
> Perhaps this is what I can work on, but I definitely need some feedback.
>
> [1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
>



Re: Transparent Data Encryption (TDE) and encrypted files

From
"Moon, Insung"
Date:
Dear hackers.

First, I don't know which email thread should written a reply,
therefore using the first email thread.
Sorry about the inconvenience...

Sawada-san and I have previously researched the PostgreSQL database
cluster file that contains user data.
The result has been updated to the WIKI page[1], so share it here.

This result is simply a list of files that contain user data, so we
can think of it as the first step in classifying which files are
encrypted.
About the SLUR file that we have talked about so far, I think that
discussions are in progress on the necessity of encryption, and I hope
that this discussion will be useful.
#In proceeding with the current development, we specified an encrypted
file using the list above.

If the survey results are different, it would be a help for this
project if correct to the WIKI page.

[1]
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_contains_of_user_data_for_PostgreSQL_files

Best regards.
Moon.

On Tue, Oct 1, 2019 at 6:26 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> For full-cluster Transparent Data Encryption (TDE), the current plan is
> to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> overflow).  The plan is:
>
>         https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
>
> We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
> other files.  Is that correct?  Do any other PGDATA files contain user
> data?
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +
>
>



Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:

> Hello.
>
> On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska <ah@cybertec.at> wrote:
> >
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> > > > However the design doesn't seem to be stable enough at the
> > > > moment for coding to make sense.
> > >
> > > Well, I think the question is whether working further on your patch
> > > could produce some things that everyone would agree are a step
> > > forward.
> >
> > It would have made a lot of sense several months ago (Masahiko Sawada actually
> > used parts of our patch in the previous version of his patch (see [1]), but
> > the requirement to use a different IV for each execution of the encryption
> > changes things quite a bit.
> >
> > Besides the relation pages and SLRU (CLOG), which are already being discussed
> > elsewhere in the thread, let's consider other two file types:
> >
> > * Temporary files (buffile.c): we derive the IV from PID of the process that
> >   created the file + segment number + block within the segment. This
> >   information does not change if you need to write the same block again. If
> >   new IV should be used for each encryption run, we can simply introduce an
> >   in-memory counter that generates the IV for each block. However it becomes
> >   trickier if the temporary file is shared by multiple backends. I think it
> >   might still be easier to expose the IV values to other backends via shared
> >   memory than to store them on disk ...
>
> I think encrypt a temporary file in a slightly different way.
> Previously, I had a lot of trouble with IV uniqueness, but I have
> proposed a unique encryption key for each file.
>
> First, in the case of the CTR mode to be used, 32 bits are used for
> the counter in the 128-bit nonce value.
> Here, the counter increases every time 16 bytes are encrypted, and
> theoretically, if nonce 96 bits are the same, a total of 64 GiB can be
> encrypted.

> Therefore, in the case of buffile.c that creates a temporary file due
> to lack of work_mem, it is possible to use up to 1GiB per file, so it
> is possible to encrypt to a simple IV value sufficiently safely.
> The problem is that a vulnerability occurs when 96-bit nonce values
> excluding Counter are the same values.

I don't think the lower 32 bits impose any limitation, see
CRYPTO_ctr128_encrypt_ctr32() in OpenSSL: if this lower part overflows, the
upper part is simply incremented. So it's up to the user to decide what
portion of the IV he wants to control and what portion should be controlled by
OpenSSL internally. Of course the application design should be such that no
overflows into the upper (user specific) part occur because those would result
in duplicate IVs.

> I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
> first, but in the worst-case PID and tempCounter are used in the same
> values.
> Therefore, the uniqueness of the encryption key was considered without
> considering the uniqueness of the IV value.

If you consider 64bit counter insufficient (here it seems that tempCounter
counts the 1GB segments), then we can't even use LSN as the IV for relation
pages.

> The encryption key uses a separate key for each file, as described earlier.

Do you mean a separate key for the whole temporary file, or for a single (1GB)
segment?

> First, it generates a hash value randomly for the file, and uses the
> hash value and KEK (or MDEK) to derive and use the key with
> HMAC-SHA256.
> In this case, there is no need to store the encryption key separately
> if it is not necessary to keep it in a separate IV file or memory.
> (IV is a hash value of 64 bits and a counter of 32 bits.)

You seem to miss the fact that user of buffile.c can seek in the file and
rewrite arbitrary part. Thus you'd have to generate a new key for the part
being changed.

I think it's easier to use the same key for the whole 1GB segment if not for
the whole temporary file, and generate an unique IV each time we write a chung
(BLCKSZ bytes).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
"Moon, Insung"
Date:
Dear Antonin Houska.
Thank you for your attention to thie matter.

On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
>
> > Hello.
> >
> > On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska <ah@cybertec.at> wrote:
> > >
> > > Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> > > > > However the design doesn't seem to be stable enough at the
> > > > > moment for coding to make sense.
> > > >
> > > > Well, I think the question is whether working further on your patch
> > > > could produce some things that everyone would agree are a step
> > > > forward.
> > >
> > > It would have made a lot of sense several months ago (Masahiko Sawada actually
> > > used parts of our patch in the previous version of his patch (see [1]), but
> > > the requirement to use a different IV for each execution of the encryption
> > > changes things quite a bit.
> > >
> > > Besides the relation pages and SLRU (CLOG), which are already being discussed
> > > elsewhere in the thread, let's consider other two file types:
> > >
> > > * Temporary files (buffile.c): we derive the IV from PID of the process that
> > >   created the file + segment number + block within the segment. This
> > >   information does not change if you need to write the same block again. If
> > >   new IV should be used for each encryption run, we can simply introduce an
> > >   in-memory counter that generates the IV for each block. However it becomes
> > >   trickier if the temporary file is shared by multiple backends. I think it
> > >   might still be easier to expose the IV values to other backends via shared
> > >   memory than to store them on disk ...
> >
> > I think encrypt a temporary file in a slightly different way.
> > Previously, I had a lot of trouble with IV uniqueness, but I have
> > proposed a unique encryption key for each file.
> >
> > First, in the case of the CTR mode to be used, 32 bits are used for
> > the counter in the 128-bit nonce value.
> > Here, the counter increases every time 16 bytes are encrypted, and
> > theoretically, if nonce 96 bits are the same, a total of 64 GiB can be
> > encrypted.
>
> > Therefore, in the case of buffile.c that creates a temporary file due
> > to lack of work_mem, it is possible to use up to 1GiB per file, so it
> > is possible to encrypt to a simple IV value sufficiently safely.
> > The problem is that a vulnerability occurs when 96-bit nonce values
> > excluding Counter are the same values.
>
> I don't think the lower 32 bits impose any limitation, see
> CRYPTO_ctr128_encrypt_ctr32() in OpenSSL: if this lower part overflows, the
> upper part is simply incremented. So it's up to the user to decide what
> portion of the IV he wants to control and what portion should be controlled by
> OpenSSL internally. Of course the application design should be such that no
> overflows into the upper (user specific) part occur because those would result
> in duplicate IVs.

I'm sorry. I seem to have misunderstood.
If I rechecked the source code of OpenSSL, as you said, it is assumed
that the upper 96bit value is changed using the ctr96_inc() function.
Sorry..

>
> > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
> > first, but in the worst-case PID and tempCounter are used in the same
> > values.
> > Therefore, the uniqueness of the encryption key was considered without
> > considering the uniqueness of the IV value.
>
> If you consider 64bit counter insufficient (here it seems that tempCounter
> counts the 1GB segments), then we can't even use LSN as the IV for relation
> pages.

The worst-case here is not a lack of tempCounter, but a problem that
occurs when PID is reused after a certain period.
Of course, it is very unlikely to be a problem because it is a
temporary file, but since the file name can know the PID and
tempFileCounter, if you accumulate some data, the same key and the
same IV will be used to encrypt other data. So I thought there could
be a problem.


>
> > The encryption key uses a separate key for each file, as described earlier.
>
> Do you mean a separate key for the whole temporary file, or for a single (1GB)
> segment?

Yes, that's right. Use a separate key per file.

>
> > First, it generates a hash value randomly for the file, and uses the
> > hash value and KEK (or MDEK) to derive and use the key with
> > HMAC-SHA256.
> > In this case, there is no need to store the encryption key separately
> > if it is not necessary to keep it in a separate IV file or memory.
> > (IV is a hash value of 64 bits and a counter of 32 bits.)
>
> You seem to miss the fact that user of buffile.c can seek in the file and
> rewrite arbitrary part. Thus you'd have to generate a new key for the part
> being changed.

That's right. I wanted to ask this too.
Is it possible to overwrite the data already written in the actual buffile.c?
Such a problem seems to become a problem when BufFileWRite function is
called, and BufFileSeek function is called, and BufFileRead is called.
In other words, the file is not written in units of 8kb, but the file
is changed in the pos, and some data is read in another pos.
I also thought that this would be a problem with re-creating the
encrypted file, i.e., IV and key change would be necessary,
So far, my research has found no case of overwriting data in the
previous pos after it has already been created in File data (where
FilWrite is called).
Can you tell me the case overwriting buffer file?Sorry..


>
> I think it's easier to use the same key for the whole 1GB segment if not for
> the whole temporary file, and generate an unique IV each time we write a chung
> (BLCKSZ bytes).

Yes. I think there will probably be a discussion about how to use
enc-key and  IV to use.
I hope to find the safest way through various discussions.

Best regards.
Moon.

>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:

> On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska <ah@cybertec.at> wrote:
> >
> > Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
> >
> > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
> > > first, but in the worst-case PID and tempCounter are used in the same
> > > values.
> > > Therefore, the uniqueness of the encryption key was considered without
> > > considering the uniqueness of the IV value.
> >
> > If you consider 64bit counter insufficient (here it seems that tempCounter
> > counts the 1GB segments), then we can't even use LSN as the IV for relation
> > pages.
>
> The worst-case here is not a lack of tempCounter, but a problem that
> occurs when PID is reused after a certain period.
> Of course, it is very unlikely to be a problem because it is a
> temporary file, but since the file name can know the PID and
> tempFileCounter, if you accumulate some data, the same key and the
> same IV will be used to encrypt other data. So I thought there could
> be a problem.

ok

> > > First, it generates a hash value randomly for the file, and uses the
> > > hash value and KEK (or MDEK) to derive and use the key with
> > > HMAC-SHA256.
> > > In this case, there is no need to store the encryption key separately
> > > if it is not necessary to keep it in a separate IV file or memory.
> > > (IV is a hash value of 64 bits and a counter of 32 bits.)
> >
> > You seem to miss the fact that user of buffile.c can seek in the file and
> > rewrite arbitrary part. Thus you'd have to generate a new key for the part
> > being changed.
>
> That's right. I wanted to ask this too.
> Is it possible to overwrite the data already written in the actual buffile.c?
> Such a problem seems to become a problem when BufFileWRite function is
> called, and BufFileSeek function is called, and BufFileRead is called.
> In other words, the file is not written in units of 8kb, but the file
> is changed in the pos, and some data is read in another pos.

v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so
that data is read and written in 8kB blocks if encryption is enabled. In order
to record the IV per block, the computation of the buffer position within the
file would have to be adjusted somehow. I can check it soon but not in the
next few days.

> I also thought that this would be a problem with re-creating the
> encrypted file, i.e., IV and key change would be necessary,
> So far, my research has found no case of overwriting data in the
> previous pos after it has already been created in File data (where
> FilWrite is called).
> Can you tell me the case overwriting buffer file?

(I suppose you mean BufFileWrite(), not FileWrite()). I don't remember if I
ever checked particular use case in the PG core, but as long as buffer.c API
allows such a thing to happen, the encryption code needs to handle it anyway.

v04-0012-Add-tests-for-buffile.c.patch in [1] contains regression tests that
do involve temp file overwriting.

[1] https://www.postgresql.org/message-id/7082.1562337694@localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Thu, Oct 3, 2019 at 4:40 PM Stephen Frost <sfrost@snowman.net> wrote:
> > * Robert Haas (robertmhaas@gmail.com) wrote:
> > > On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <bruce@momjian.us> wrote:
> > > > For full-cluster Transparent Data Encryption (TDE), the current plan is
> > > > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
> > > > overflow).  The plan is:
> > > >
> > > >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption
> > > >
> > > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact,
> > or
> > > > other files.  Is that correct?  Do any other PGDATA files contain user
> > > > data?
> > >
> > > As others have said, that sounds wrong to me.  I think you need to
> > > encrypt everything.
> >
> > That isn't what other database systems do though and isn't what people
> > actually asking for this feature are expecting to have or deal with.
>
> Do any of said other database even *have* the equivalence of say pg_clog or
> pg_multixact *stored outside their tablespaces*? (Because as long as the
> data is in the tablespace, it's encrypted when using tablespace
> encryption..)

That's a fair question and while I'm not specifically sure about all of
them, I do believe you're right that for some, the tablespace/database
includes that information (and WAL) instead of having it external.  I'm
also pretty sure that there's still enough information that isn't
encrypted to at least *start* the database server.  In many ways, we are
unfortunately the oddball when it comes to having these cluster-level
things that we probably do want to encrypt (I'd be thinking more about
pg_authid here than clog, and potentially the WAL).

I've been meaning to write up a wiki page or something on this but I
just haven't found time, so I'm going to give up on that and just share
my thoughts here and folks can do with them what they wish-

When it comes to use-cases and attack vectors, I feel like there's
really two "big" choices, and I'd like us to support both, ideally, but
it boils down to this: do you trust the database maintenance, et al,
processes, or no?  The same question, put another way, is, do you trust
having unencrypted/sensitive data in shared buffers?

Let's talk through these for a minute:

Yes, shared_buffers is trusted implies:

- More data (usefully or not) can be encrypted
  - WAL, clog, multixact, pg statistics, et al
- Various PG processes need to know the decryption keys necessary
  (autovacuum, crash recovery, being big ones)
  ... ideally, we could still *start*, which is why I continue to argue
  that we shouldn't encrypt *everything* because not being able to even
  start the database system really sucks.  What exactly it is that we
  need I don't know off-hand, maybe we don't need clog, but it seems
  likely we'll need pg_controldata, for example.  My gut feeling on this
  is really that we need enough to start and open up the vault- which
  probably means that the vault needs to look more like what I describe
  below in the situation where you don't trust shared_buffers, to the
  point where we might have seperate WAL/clog/et al for the vault itself
- Fewer limitations (indexes can work more-or-less as-is, for example)
- Attack vectors:
  - Anything that can access shared buffers can get a ton of data
  - Bugs in PG that expose memory can be leveraged to get access to data
    and keys
  - root on the system can pretty trivially gain access to everything
  - If someone steals the disks/backups, they can't get access to much
  - Or, if your cloud/storage vendor decides to snoop around they can't
    see much

No, shared_buffers is NOT trusted implies:

- we need enough unencrypted data to bring the system up and online and
  working (crash recovery, autovacuum, need to work)- this likely
  implies that things like WAL, clog, et al, have to be mostly
  unencrypted, to allow these processes to work
- Limitations on indexes (we can't have the index have unencrypted data,
  but we also have to have autovacuum able to work...  I actually wonder
  if this might be something we could solve by encrypting the internal
  pages, leaving the TIDs exposed so that they can be cleaned up but
  leaf pages have their own ordering so that's not great...  I suspect
  something like this is the reason for the index limitation in other
  database systems that support column-level encryption)
- Sensitive data in WAL is already encrypted
- All decryption happens in a given backend when it's sending data to
  the client
- Attack vectors:
  - root can watch network traffic or individual sessions, possibly gain
    access to keys (certainly with more difficulty though)
  - Bugs in PG shouldn't make it very easy for an external attacker to
    gain access to anything except what they already had access to
    (sure, they could see shared buffers and see what's in their
    backend, but everything in shared buffers that's sensitive should be
    encrypted, and for the most part what's in their backend should only
    be things they're allowed to access anyway)
  - If someone steals the disks/backups, they could potentially figure
    out more information about what was happening on the system
  - Or, if your cloud/storage vendor decides to snoop around, they could
    possibly figure things out

And then, of course, you can get into the fun of, well, maybe we should
have both options be supported at the same time.

Looking from an attack-vector standpoint, if the concern is primairly
about external attackers through SQL injection and database bugs, not
trusting shared buffers is pretty clearly the way to go.  If the concern
is about stealing hard drives or backups, well, FDE is a great solution
there, along with encrypted backups, but, sure, if we rule those out for
some reason then we can say that, yes, this will be helpful for that
kind of an attack.

In either case, we do need a vaulting system, and I think we need to be
able to start up PG and get the vault open and accept connections.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Mon, Oct  7, 2019 at 12:34:36PM -0400, Bruce Momjian wrote:
> On Mon, Oct  7, 2019 at 12:30:37PM -0400, Robert Haas wrote:
> > On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > Well, I am starting with the things I _know_ need encrypting, and am
> > > then waiting for others to tell me what to add.   Cybertec has not
> > > provided a list and reasons yet, that I have seen.  This is why I
> > > started this public thread, so we could get a list and agree on it.
> > 
> > Well that's fine, but you could also open up the patch and have a look
> > at it. Even if you just looked at which files it modifies, it would
> > enable you to add some important things do your list.
> 
> Uh, I am really then just importing what one group decided, which seems
> unsafe.  I think it needs a fresh look at all files.

Someone has written a list of all PGDATA files so its TDE status can be
recorded:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_contains_of_user_data_for_PostgreSQL_files

Feel free to update it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Craig Ringer
Date:
On Wed, 9 Oct 2019 at 22:30, Stephen Frost <sfrost@snowman.net> wrote:

- All decryption happens in a given backend when it's sending data to
  the client

That is not what I think of as TDE. But upon review, it looks like I'm wrong, and the usual usage of TDE is for server-side-only encryption at-rest.

But when I'm asked about TDE, people are generally actually asking for data that's encrypted at rest and in transit, where the client driver is responsible for data encryption/decryption transparently to the application. The server is expected to be able to mark columns as encrypted, so it can report the column's true datatype while storing a bytea-like encrypted value for it instead. In this case the server does not know the column encryption/decryption key at all, and it cannot perform any operations on the data except for input and output.

Some people ask for indexable encrypted columns, but I tend to explain to them how impractical and inefficient that is. You can support hash indexes if you don't salt the encrypted data, but that greatly weakens the encryption by allowing attackers to use dictionary attacks and other brute force techniques efficiently. And you can't support b-tree > and < without very complex encryption schemes (https://en.wikipedia.org/wiki/Homomorphic_encryption).

I see quite a lot of demand for this column level driver-assisted encryption. I think it'd actually be quite simple for the PostgreSQL server to provide support for it too, since most of the work is done by the driver. But I won't go into the design here since this thread appears to be about encryption at rest only, fully server-side.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Craig Ringer (craig@2ndquadrant.com) wrote:
> On Wed, 9 Oct 2019 at 22:30, Stephen Frost <sfrost@snowman.net> wrote:
> > - All decryption happens in a given backend when it's sending data to
> >   the client
>
> That is not what I think of as TDE. But upon review, it looks like I'm
> wrong, and the usual usage of TDE is for server-side-only encryption
> at-rest.

Yes, that's typically what TDE is, at least in the relational DBMS
world.

> But when I'm asked about TDE, people are generally actually asking for data
> that's encrypted at rest and in transit, where the client driver is
> responsible for data encryption/decryption transparently to the
> application. The server is expected to be able to mark columns as
> encrypted, so it can report the column's true datatype while storing a
> bytea-like encrypted value for it instead. In this case the server does not
> know the column encryption/decryption key at all, and it cannot perform any
> operations on the data except for input and output.

This is definitely also a thing though I'm not sure what it's called,
exactly.  Having everything happen on the client side is also,
certainly, a better solution as it removes the risk of root on the
database server being able to gain access to the data.  This is also
what I recommend in a lot of situations- have the client side
application handle the encryption/decryption, working with a vaulting
solution ideally, but it'd definitely be neat to add this as a
capability to PG.

> Some people ask for indexable encrypted columns, but I tend to explain to
> them how impractical and inefficient that is. You can support hash indexes
> if you don't salt the encrypted data, but that greatly weakens the
> encryption by allowing attackers to use dictionary attacks and other brute
> force techniques efficiently. And you can't support b-tree > and < without
> very complex encryption schemes (
> https://en.wikipedia.org/wiki/Homomorphic_encryption).

I'm not sure why you wouldn't salt the hash..?  That's pretty important,
imv, and, of course, you have to store the salt but that shouldn't be
that big of a deal, I wouldn't think.  Agreed that you can't support
b-tree (even with complex encryption schemes..., I've read some papers
about how just </> is enough to be able to glean a good bit of info
from, not super relevant to the overall discussion here so I won't go
hunt them down right now, but if there's interest, I can try to do so).

> I see quite a lot of demand for this column level driver-assisted
> encryption. I think it'd actually be quite simple for the PostgreSQL server
> to provide support for it too, since most of the work is done by the
> driver. But I won't go into the design here since this thread appears to be
> about encryption at rest only, fully server-side.

Yes, that's what this thread is about, but I very much like the idea of
driver-assisted encryption on the client side and would love it if
someone had time to work on it.

Thanks,

Stephen

Attachment

Re: Transparent Data Encryption (TDE) and encrypted files

From
Masahiko Sawada
Date:
On Wed, Oct 9, 2019 at 3:57 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
>
> > On Wed, Oct 9, 2019 at 2:42 PM Antonin Houska <ah@cybertec.at> wrote:
> > >
> > > Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
> > >
> > > > I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
> > > > first, but in the worst-case PID and tempCounter are used in the same
> > > > values.
> > > > Therefore, the uniqueness of the encryption key was considered without
> > > > considering the uniqueness of the IV value.
> > >
> > > If you consider 64bit counter insufficient (here it seems that tempCounter
> > > counts the 1GB segments), then we can't even use LSN as the IV for relation
> > > pages.
> >
> > The worst-case here is not a lack of tempCounter, but a problem that
> > occurs when PID is reused after a certain period.
> > Of course, it is very unlikely to be a problem because it is a
> > temporary file, but since the file name can know the PID and
> > tempFileCounter, if you accumulate some data, the same key and the
> > same IV will be used to encrypt other data. So I thought there could
> > be a problem.
>
> ok
>
> > > > First, it generates a hash value randomly for the file, and uses the
> > > > hash value and KEK (or MDEK) to derive and use the key with
> > > > HMAC-SHA256.
> > > > In this case, there is no need to store the encryption key separately
> > > > if it is not necessary to keep it in a separate IV file or memory.
> > > > (IV is a hash value of 64 bits and a counter of 32 bits.)
> > >
> > > You seem to miss the fact that user of buffile.c can seek in the file and
> > > rewrite arbitrary part. Thus you'd have to generate a new key for the part
> > > being changed.
> >
> > That's right. I wanted to ask this too.
> > Is it possible to overwrite the data already written in the actual buffile.c?
> > Such a problem seems to become a problem when BufFileWRite function is
> > called, and BufFileSeek function is called, and BufFileRead is called.
> > In other words, the file is not written in units of 8kb, but the file
> > is changed in the pos, and some data is read in another pos.
>
> v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so
> that data is read and written in 8kB blocks if encryption is enabled. In order
> to record the IV per block, the computation of the buffer position within the
> file would have to be adjusted somehow. I can check it soon but not in the
> next few days.

As far as I read the patch the nonce consists of pid, counter and
block number where the counter is the number incremented each time of
creating a BufFile. Therefore it could happen to rewrite the buffer
data with the same nonce and key, which is bad.

So I think we can have the rewrite counter of the block in the each 8k
block header. And then the nonce consists of block number within a
segment file (4 bytes), temp file counter (8 bytes), rewrite counter
(2 bytes) and CTR mode counter (2 bytes). And then if we have a
single-use encryption key per backend processes I guess we can
guarantee the uniqueness of the combination of key and nonce.

Regards,

--
Masahiko Sawada



Re: Transparent Data Encryption (TDE) and encrypted files

From
Antonin Houska
Date:
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Wed, Oct 9, 2019 at 3:57 PM Antonin Houska <ah@cybertec.at> wrote:
> >
> > Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
> >
> > v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so
> > that data is read and written in 8kB blocks if encryption is enabled. In order
> > to record the IV per block, the computation of the buffer position within the
> > file would have to be adjusted somehow. I can check it soon but not in the
> > next few days.
>
> As far as I read the patch the nonce consists of pid, counter and
> block number where the counter is the number incremented each time of
> creating a BufFile. Therefore it could happen to rewrite the buffer
> data with the same nonce and key, which is bad.

This patch was written before the requirement on non-repeating IV was raiesed,
and it does not use the AES-CTR mode. I mentioned it here because it reads /
writes data in 8kB blocks.

> So I think we can have the rewrite counter of the block in the each 8k
> block header. And then the nonce consists of block number within a
> segment file (4 bytes), temp file counter (8 bytes), rewrite counter
> (2 bytes) and CTR mode counter (2 bytes). And then if we have a
> single-use encryption key per backend processes I guess we can
> guarantee the uniqueness of the combination of key and nonce.

Since the segment size is 1 GB, the segment cosists of 2^17 blocks, so 4 bytes
will not be utilized.

As for the "CTR mode counter", consider that it gets incremented once per 16
bytes of input. So even if BLCKSZ is 32 kB, we need no more than 11 bits for
this counter.

If these two parts become smaller, we can perhaps increase the size of the
"rewrite counter".

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Transparent Data Encryption (TDE) and encrypted files

From
Masahiko Sawada
Date:
On Mon, Oct 14, 2019 at 3:42 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > On Wed, Oct 9, 2019 at 3:57 PM Antonin Houska <ah@cybertec.at> wrote:
> > >
> > > Moon, Insung <tsukiwamoon.pgsql@gmail.com> wrote:
> > >
> > > v04-0011-Make-buffile.c-aware-of-encryption.patch in [1] changes buffile.c so
> > > that data is read and written in 8kB blocks if encryption is enabled. In order
> > > to record the IV per block, the computation of the buffer position within the
> > > file would have to be adjusted somehow. I can check it soon but not in the
> > > next few days.
> >
> > As far as I read the patch the nonce consists of pid, counter and
> > block number where the counter is the number incremented each time of
> > creating a BufFile. Therefore it could happen to rewrite the buffer
> > data with the same nonce and key, which is bad.
>
> This patch was written before the requirement on non-repeating IV was raiesed,
> and it does not use the AES-CTR mode. I mentioned it here because it reads /
> writes data in 8kB blocks.
>
> > So I think we can have the rewrite counter of the block in the each 8k
> > block header. And then the nonce consists of block number within a
> > segment file (4 bytes), temp file counter (8 bytes), rewrite counter
> > (2 bytes) and CTR mode counter (2 bytes). And then if we have a
> > single-use encryption key per backend processes I guess we can
> > guarantee the uniqueness of the combination of key and nonce.
>
> Since the segment size is 1 GB, the segment cosists of 2^17 blocks, so 4 bytes
> will not be utilized.
>
> As for the "CTR mode counter", consider that it gets incremented once per 16
> bytes of input. So even if BLCKSZ is 32 kB, we need no more than 11 bits for
> this counter.
>
> If these two parts become smaller, we can perhaps increase the size of the
> "rewrite counter".

Yeah I designed it to make implementation easier but we can increase
the size of the rewrite counter to 3 bytes while the block number uses
3 bytes.

Regards,

--
Masahiko Sawada



Re: Transparent Data Encryption (TDE) and encrypted files

From
Bruce Momjian
Date:
On Thu, Oct 10, 2019 at 10:40:37AM -0400, Stephen Frost wrote:
> > Some people ask for indexable encrypted columns, but I tend to explain to
> > them how impractical and inefficient that is. You can support hash indexes
> > if you don't salt the encrypted data, but that greatly weakens the
> > encryption by allowing attackers to use dictionary attacks and other brute
> > force techniques efficiently. And you can't support b-tree > and < without
> > very complex encryption schemes (
> > https://en.wikipedia.org/wiki/Homomorphic_encryption).
> 
> I'm not sure why you wouldn't salt the hash..?  That's pretty important,
> imv, and, of course, you have to store the salt but that shouldn't be
> that big of a deal, I wouldn't think.  Agreed that you can't support
> b-tree (even with complex encryption schemes..., I've read some papers
> about how just </> is enough to be able to glean a good bit of info
> from, not super relevant to the overall discussion here so I won't go
> hunt them down right now, but if there's interest, I can try to do so).

Yes. you can add salt to the value you store in the hash index, but when
you are looking for a matching value, how do you know what salt to use
to find it in the index?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Transparent Data Encryption (TDE) and encrypted files

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, Oct 10, 2019 at 10:40:37AM -0400, Stephen Frost wrote:
> > > Some people ask for indexable encrypted columns, but I tend to explain to
> > > them how impractical and inefficient that is. You can support hash indexes
> > > if you don't salt the encrypted data, but that greatly weakens the
> > > encryption by allowing attackers to use dictionary attacks and other brute
> > > force techniques efficiently. And you can't support b-tree > and < without
> > > very complex encryption schemes (
> > > https://en.wikipedia.org/wiki/Homomorphic_encryption).
> >
> > I'm not sure why you wouldn't salt the hash..?  That's pretty important,
> > imv, and, of course, you have to store the salt but that shouldn't be
> > that big of a deal, I wouldn't think.  Agreed that you can't support
> > b-tree (even with complex encryption schemes..., I've read some papers
> > about how just </> is enough to be able to glean a good bit of info
> > from, not super relevant to the overall discussion here so I won't go
> > hunt them down right now, but if there's interest, I can try to do so).
>
> Yes. you can add salt to the value you store in the hash index, but when
> you are looking for a matching value, how do you know what salt to use
> to find it in the index?

Yeah, if the only value you have to look up with is the unencrypted
sensitive information itself then you'd have to have the data hashed
without a salt.

If the application had some way of providing a salt and then sending it
to the database as part of the query, then you could (we used to do
exactly this with md5...).  This probably gets to be pretty use-case
specific, but it seems like if we had a data type for "hashed value,
optionally including a salt" which could then be used with a hash index,
it'd be pretty helpful for users.

Thanks,

Stephen

Attachment