Thread: WIP: Data at rest encryption
Hi all, I have been working on data-at-rest encryption support for PostgreSQL. In my experience this is a common request that customers make. The short of the feature is that all PostgreSQL data files are encrypted with a single master key and are decrypted when read from the OS. It does not provide column level encryption which is an almost orthogonal feature, arguably better done client side. Similar things can be achieved with filesystem level encryption. However this is not always optimal for various reasons. One of the better reasons is the desire for HSM based encryption in a storage area network based setup. Attached to this mail is a work in progress patch that adds an extensible encryption mechanism. There are some loose ends left to tie up, but the general concept and architecture is at a point where it's ready for some feedback, fresh ideas and bikeshedding. Usage ===== Set up database like so: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY initdb -k -K pgcrypto $PGDATA ) Start PostgreSQL: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY postgres $PGDATA ) Design ====== The patch adds a new GUC called encryption_library, when specified the named library is loaded before shared_preload_libraries and is expected to register its encryption routines. For now the API is pretty narrow, one parameterless function that lets the extension do key setup on its own terms, and two functions for encrypting/decrypting an arbitrary sized block of data with tweak. The tweak should alter the encryption function so that identical block contents are encrypted differently based on their location. The GUC needs to be set at bootstrap time, so it gets set by a new option for initdb. During bootstrap an encryption sample gets stored in the control file, enabling useful error messages. The library name is not stored in controldata. I'm not quite sure about this decision. On one hand it would be very useful to tell the user what he needs to get at his data if the configuration somehow goes missing and it would get rid of the extra GUC. On the other hand I don't really want to bloat control data, and the same encryption algorithm could be provided by different implementations. For now the encryption is done for everything that goes through md, xlog and slru. Based on a review of read/write/fread/fwrite calls this list is missing: * BufFile - needs refactoring * Logical reorder buffer serialization - probably needs a stream mode cipher API addition. * logical_heap_rewrite - can be encrypted as one big block * 2PC state data - ditto * pg_stat_statements - query texts get appended so a stream mode cipher might be needed here too. copydir needed some changes too because tablespace and database oid are included in the tweak and so copying also needs to decrypt and encrypt with the new tweak value. For demonstration purposes I imported Brian Gladman's AES-128-XTS mode implementation into pgcrypto and used an environment variable for key setup. This part is not really in any reviewable state, the XTS code needs heavy cleanup to bring it up to PostgreSQL coding standards, keysetup needs something secure, like PBKDF2 or scrypt. Performance with current AES implementation is not great, but not horrible either, I'm seeing around 2x slowdown for larger than shared_buffers, smaller than free memory workloads. However the plan is to fix this - I have a prototype AES-NI implementation that does 3GB/s per core on my Haswell based laptop (1.25 B/cycle). Open questions ============== The main questions is what to do about BufFile? It currently provides both unaligned random access and a block based interface. I wonder if it would be a good idea to refactor it to be fully block based under the covers. I would also like to incorporate some database identifier as a salt in key setup. However, system identifier stored in control file doesn't fit this role well. It gets initialized somewhat too late in the bootstrap process, and more importantly, gets changed on pg_upgrade. This will make link mode upgrades impossible, which seems like a no go. I'm torn whether to add a new value for this purpose (perhaps stored outside the control file) or allow setting of system identifier via initdb. The first seems like a better idea, the file could double as a place to store additional encryption parameters, like key length or different cipher primitive. Regards, Ants Aasma
Attachment
On Tue, Jun 7, 2016 at 11:56 PM, Ants Aasma <ants.aasma@gmail.com> wrote: > Hi all, > > I have been working on data-at-rest encryption support for PostgreSQL. > In my experience this is a common request that customers make. The > short of the feature is that all PostgreSQL data files are encrypted > with a single master key and are decrypted when read from the OS. It > does not provide column level encryption which is an almost orthogonal > feature, arguably better done client side. > > Similar things can be achieved with filesystem level encryption. > However this is not always optimal for various reasons. One of the > better reasons is the desire for HSM based encryption in a storage > area network based setup. > > Attached to this mail is a work in progress patch that adds an > extensible encryption mechanism. There are some loose ends left to tie > up, but the general concept and architecture is at a point where it's > ready for some feedback, fresh ideas and bikeshedding. Yes, encryption is really a nice and wanted feature. Following are my thoughts regarding the approach. 1. Instead of doing the entire database files encryption, how about providing user an option to protect only some particular tables that wants the encryption at table/tablespace level. This not only provides an option to the user, it reduces the performance impact on tables that doesn't need any encryption. The problem with this approach is that every xlog record needs to validate to handle the encryption /decryption, instead of at page level. 2. Instead of depending on a contrib module for the encryption, how about integrating pgcrypto contrib in to the core and add that as a default encryption method. And also provide an option to the user to use a different encryption methods if needs. 3. Currently entire xlog pages are encrypted and stored in the file. can pg_xlogdump works with those files? 4. For logical decoding, how about the adding a decoding behavior based on the module to decide whether data to be encrypted/decrypted. 5. Instead of providing passphrase through environmental variable, better to provide some options to pg_ctl etc. 6. I don't have any idea whether is it possible to integrate the checksum and encryption in a single shot to avoid performance penalty. > I would also like to incorporate some database identifier as a salt in > key setup. However, system identifier stored in control file doesn't > fit this role well. It gets initialized somewhat too late in the > bootstrap process, and more importantly, gets changed on pg_upgrade. > This will make link mode upgrades impossible, which seems like a no > go. I'm torn whether to add a new value for this purpose (perhaps > stored outside the control file) or allow setting of system identifier > via initdb. The first seems like a better idea, the file could double > as a place to store additional encryption parameters, like key length > or different cipher primitive. I feel separate file is better to include the key data instead of pg_control file. Regards, Hari Babu Fujitsu Australia
On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > 1. Instead of doing the entire database files encryption, how about > providing user an option to protect only some particular tables that > wants the encryption at table/tablespace level. This not only provides > an option to the user, it reduces the performance impact on tables > that doesn't need any encryption. The problem with this approach > is that every xlog record needs to validate to handle the encryption > /decryption, instead of at page level. Is there a real need for this? The customers I have talked to want to encrypt the whole database and my goal is to make the feature fast enough to make that feasible for pretty much everyone. I guess switching encryption off per table would be feasible, but the key setup would still need to be done at server startup. Per record encryption would result in some additional information leakage though. Overall I thought it would not be worth it, but I'm willing to have my mind changed on this. > 2. Instead of depending on a contrib module for the encryption, how > about integrating pgcrypto contrib in to the core and add that as a > default encryption method. And also provide an option to the user > to use a different encryption methods if needs. Technically that would be simple enough, this is more of a policy decision. I think having builtin encryption provided by pgcrypto is completely fine. If a consensus emerges that it needs to be integrated, it would need to be a separate patch anyway. > 3. Currently entire xlog pages are encrypted and stored in the file. > can pg_xlogdump works with those files? Technically yes, with the patch as it stands, no. Added this to my todo list. > 4. For logical decoding, how about the adding a decoding behavior > based on the module to decide whether data to be encrypted/decrypted. The data to be encrypted does not depend on the module used, so I don't think it should be module controlled. The reorder buffer contains pretty much the same stuff as the xlog, so not encrypting it does not look like a valid choice. For logical heap rewrites it could be argued that nothing useful is leaked in most cases, but encrypting it is not hard. Just a small matter of programming. > 5. Instead of providing passphrase through environmental variable, > better to provide some options to pg_ctl etc. That looks like it would be worse from a security perspective. Integrating a passphrase prompt would be an option, but a way for scripts to provide passphrases would still be needed. > 6. I don't have any idea whether is it possible to integrate the checksum > and encryption in a single shot to avoid performance penalty. Currently no, the checksum gets stored in the page header and for any decent cipher mode the encryption of the rest of the page will depend on it. However, the performance difference should be negligible because both algorithms are compute bound for cached data. The data is very likely to be completely in L1 cache as the operations are done in quick succession. The non-cryptographic checksum algorithm could actually be an attack vector for an adversary that can trigger repeated encryption by tweaking a couple of bytes at the end of the page to see when the checksum matches and try to infer the data from that. Similarly to the CRIME attack. However the LSN stored at the beginning of the page header basically provides a nonce that makes this impossible. This also means that encryption needs to imply wal_log_hints. Will include this in the next version of the patch. >> I would also like to incorporate some database identifier as a salt in >> key setup. However, system identifier stored in control file doesn't >> fit this role well. It gets initialized somewhat too late in the >> bootstrap process, and more importantly, gets changed on pg_upgrade. >> This will make link mode upgrades impossible, which seems like a no >> go. I'm torn whether to add a new value for this purpose (perhaps >> stored outside the control file) or allow setting of system identifier >> via initdb. The first seems like a better idea, the file could double >> as a place to store additional encryption parameters, like key length >> or different cipher primitive. > > I feel separate file is better to include the key data instead of pg_control > file. I guess that would be more flexible. However I think at least the fact that the database is encrypted should remain in the control file to provide useful error messages for faulty backup procedures. Thanks for your input. Regards, Ants Aasma
On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote: >> I feel separate file is better to include the key data instead of pg_control >> file. > > I guess that would be more flexible. However I think at least the fact > that the database is encrypted should remain in the control file to > provide useful error messages for faulty backup procedures. Another possibility could be always to do some encryption at data-type level for text data. For example I recalled the following thing while going through this thread: https://github.com/nec-postgres/tdeforpg Though I don't quite understand the use for encrypt.enable in this code... This has the advantage to not patch upstream. -- Michael
On Mon, Jun 13, 2016 at 5:17 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote: >>> I feel separate file is better to include the key data instead of pg_control >>> file. >> >> I guess that would be more flexible. However I think at least the fact >> that the database is encrypted should remain in the control file to >> provide useful error messages for faulty backup procedures. > > Another possibility could be always to do some encryption at data-type > level for text data. For example I recalled the following thing while > going through this thread: > https://github.com/nec-postgres/tdeforpg > Though I don't quite understand the use for encrypt.enable in this > code... This has the advantage to not patch upstream. While certainly possible, this does not cover the requirements I want to satisfy - user data never gets stored on disk unencrypted without making changes to the application or schema. This seems to be mostly about separating administrator roles, specifically that centralised storage and backup administrators should not have access to database contents. I see this as orthogonal to per column encryption, which in my opinion is better done in the application. Regards, Ants Aasma
On 6/7/16 9:56 AM, Ants Aasma wrote: > Similar things can be achieved with filesystem level encryption. > However this is not always optimal for various reasons. One of the > better reasons is the desire for HSM based encryption in a storage > area network based setup. Could you explain this in more detail? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/12/16 3:13 AM, Ants Aasma wrote: >> 5. Instead of providing passphrase through environmental variable, >> > better to provide some options to pg_ctl etc. > That looks like it would be worse from a security perspective. > Integrating a passphrase prompt would be an option, but a way for > scripts to provide passphrases would still be needed. Environment variables and command-line options are visible to other processes on the machine, so neither of these approaches is really going to work. We would need some kind of integration with secure password-entry mechanisms, such as pinentry. Also note that all tools that work directly on the data directory would need password-entry and encryption/decryption support, including pg_basebackup, pg_controldata, pg_ctl, pg_receivexlog, pg_resetxlog, pg_rewind, pg_upgrade, pg_xlogdump. It seems that your implementation doesn't encrypt pg_control, thus avoiding some of that. But that doesn't seem right. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jun 12, 2016 at 5:13 PM, Ants Aasma <ants.aasma@gmail.com> wrote: > On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: > >> 2. Instead of depending on a contrib module for the encryption, how >> about integrating pgcrypto contrib in to the core and add that as a >> default encryption method. And also provide an option to the user >> to use a different encryption methods if needs. > > Technically that would be simple enough, this is more of a policy > decision. I think having builtin encryption provided by pgcrypto is > completely fine. If a consensus emerges that it needs to be > integrated, it would need to be a separate patch anyway. In our proprietary database, we are using the encryption methods provided by openSSL [1]. May be we can have a look at those methods provided by openSSL for the use of encryption for builds under USE_SSL. Ignore it if you have already validated. >> 5. Instead of providing passphrase through environmental variable, >> better to provide some options to pg_ctl etc. > > That looks like it would be worse from a security perspective. > Integrating a passphrase prompt would be an option, but a way for > scripts to provide passphrases would still be needed. What I felt was, if we store the passphrase in an environmental variable, a person who is having an access to the system can get the details and using that it may be possible to decrypt the data files. [1] - https://www.openssl.org/docs/manmaster/crypto/EVP_EncryptInit.html Regards, Hari Babu Fujitsu Australia
On 6/12/16 2:13 AM, Ants Aasma wrote: > On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: >> > 1. Instead of doing the entire database files encryption, how about >> > providing user an option to protect only some particular tables that >> > wants the encryption at table/tablespace level. This not only provides >> > an option to the user, it reduces the performance impact on tables >> > that doesn't need any encryption. The problem with this approach >> > is that every xlog record needs to validate to handle the encryption >> > /decryption, instead of at page level. > Is there a real need for this? The customers I have talked to want to > encrypt the whole database and my goal is to make the feature fast > enough to make that feasible for pretty much everyone. I guess > switching encryption off per table would be feasible, but the key > setup would still need to be done at server startup. Per record > encryption would result in some additional information leakage though. > Overall I thought it would not be worth it, but I'm willing to have my > mind changed on this. I actually design with this in mind. Tables that contain sensitive info go into designated schemas, partly so that you can blanket move all of those to an encrypted tablespace (or safer would be to move things not in those schemas to an unencrypted tablespace). Since that can be done with an encrypted filesystem maybe that's good enough. (It's not really clear to me what this buys us over an encrypted FS, other than a feature comparison checkmark...) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 06/14/2016 09:59 PM, Jim Nasby wrote: > On 6/12/16 2:13 AM, Ants Aasma wrote: >> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi >> <kommi.haribabu@gmail.com> wrote: >>> > 1. Instead of doing the entire database files encryption, how about >>> > providing user an option to protect only some particular tables that >>> > wants the encryption at table/tablespace level. This not only >>> provides >>> > an option to the user, it reduces the performance impact on tables >>> > that doesn't need any encryption. The problem with this approach >>> > is that every xlog record needs to validate to handle the encryption >>> > /decryption, instead of at page level. >> Is there a real need for this? The customers I have talked to want to >> encrypt the whole database and my goal is to make the feature fast >> enough to make that feasible for pretty much everyone. I guess >> switching encryption off per table would be feasible, but the key >> setup would still need to be done at server startup. Per record >> encryption would result in some additional information leakage though. >> Overall I thought it would not be worth it, but I'm willing to have my >> mind changed on this. > > I actually design with this in mind. Tables that contain sensitive > info go into designated schemas, partly so that you can blanket move > all of those to an encrypted tablespace (or safer would be to move > things not in those schemas to an unencrypted tablespace). Since that > can be done with an encrypted filesystem maybe that's good enough. > (It's not really clear to me what this buys us over an encrypted FS, > other than a feature comparison checkmark...) the reason why this is needed is actually very simple: security guidelines and legal requirements ... we have dealt with a couple of companies recently, who explicitly demanded PostgreSQL level encryption in a transparent way to fulfill some internal or legal requirements. this is especially true for financial stuff. and yes, sure ... you can do a lot of stuff with filesystem encryption. the core idea of this entire thing is however to have a counterpart on the database level. if you don't have the key you cannot start the instance and if you happen to get access to the filesystem you are still not able to fire up the DB. as it said: requirements by ever bigger companies. as far as benchmarking is concerned: i did a quick test yesterday (not with the final AES implementation yet) and i got pretty good results. with a reasonably well cached database in a typical application I expect to loose around 10-20%. if everything fits in memory there is 0 loss of course. the worst I got with the standard AES (no hardware support used yet) I lost around 45% or so. but this requires a value as low as 32 MB of shared buffers or so. many thanks, hans -- Hans-Jürgen Schönig Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/7/16 9:56 AM, Ants Aasma wrote: >> >> Similar things can be achieved with filesystem level encryption. >> However this is not always optimal for various reasons. One of the >> better reasons is the desire for HSM based encryption in a storage >> area network based setup. > > Could you explain this in more detail? I don't think Ants ever responded to this point. I'm curious whether this is something that is likely to be pursued for PostgreSQL 11. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 12, 2017 at 10:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/7/16 9:56 AM, Ants Aasma wrote: >>> >>> Similar things can be achieved with filesystem level encryption. >>> However this is not always optimal for various reasons. One of the >>> better reasons is the desire for HSM based encryption in a storage >>> area network based setup. >> >> Could you explain this in more detail? > > I don't think Ants ever responded to this point. > > I'm curious whether this is something that is likely to be pursued for > PostgreSQL 11. Yes, the plan is to pick it up again, Real Soon Now(tm). There are a couple of loose ends for stuff that should be encrypted, but in the current state of the patch aren't yet (from the top of my head, logical decoding and pg_stat_statements write some files). The code handling keys could really take better precautions as Peter pointed out in another e-mail. And I expect there to be a bunch of polishing work to make the APIs as good as they can be. To answer Peter's question about HSMs, many enterprise deployments are on top of shared storage systems. For regulatory reasons or to limit security clearance of storage administrators, the data on shared storage should be encrypted. Now for there to be any point to this endeavor, the key needs to be stored somewhere else. This is where hardware security modules come in. They are basically hardware key storage appliances that can either output the key when requested, or for higher security hold onto the key and perform encryption/decryption on behalf of the user. The patch enables the user to use a custom shell command to go and fetch the key from the HSM, for example using the KMIP protocol. Or a motivated person could write an extension that implements the encryption hooks to delegate encryption/decryption of blocks to an HSM. Fundamentally there doesn't seem to be a big benefit of implementing the encryption at PostgreSQL level instead of the filesystem. The patch doesn't take any real advantage from the higher level knowledge of the system, nor do I see much possibility for it to do that. The main benefit for us is that it's much easier to get a PostgreSQL based solution deployed. I'm curious if the community thinks this is a feature worth having? Even considering that security experts would classify this kind of encryption as a checkbox feature. Regards, Ants Aasma
On 6/12/17 17:11, Ants Aasma wrote: > I'm curious if the community thinks this is a feature worth having? > Even considering that security experts would classify this kind of > encryption as a checkbox feature. File system encryption already exists and is well-tested. I don't see any big advantages in re-implementing all of this one level up. You would have to touch every single place in PostgreSQL backend and tool code where a file is being read or written. Yikes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Ants, all, * Ants Aasma (ants.aasma@eesti.ee) wrote: > Yes, the plan is to pick it up again, Real Soon Now(tm). There are a > couple of loose ends for stuff that should be encrypted, but in the > current state of the patch aren't yet (from the top of my head, > logical decoding and pg_stat_statements write some files). The code > handling keys could really take better precautions as Peter pointed > out in another e-mail. And I expect there to be a bunch of polishing > work to make the APIs as good as they can be. Very glad to hear that you're going to be continuing to work on this effort. > To answer Peter's question about HSMs, many enterprise deployments are > on top of shared storage systems. For regulatory reasons or to limit > security clearance of storage administrators, the data on shared > storage should be encrypted. Now for there to be any point to this > endeavor, the key needs to be stored somewhere else. This is where > hardware security modules come in. They are basically hardware key > storage appliances that can either output the key when requested, or > for higher security hold onto the key and perform > encryption/decryption on behalf of the user. The patch enables the > user to use a custom shell command to go and fetch the key from the > HSM, for example using the KMIP protocol. Or a motivated person could > write an extension that implements the encryption hooks to delegate > encryption/decryption of blocks to an HSM. An extension, or perhaps even something built-in, would certainly be good here but I don't think it's necessary in an initial implementation as long as it's something we can do later. > Fundamentally there doesn't seem to be a big benefit of implementing > the encryption at PostgreSQL level instead of the filesystem. The > patch doesn't take any real advantage from the higher level knowledge > of the system, nor do I see much possibility for it to do that. The > main benefit for us is that it's much easier to get a PostgreSQL based > solution deployed. Making it easier to get a PostgreSQL solution deployed is certainly a very worthwhile goal. > I'm curious if the community thinks this is a feature worth having? > Even considering that security experts would classify this kind of > encryption as a checkbox feature. I would say that some security experts would consider it a 'checkbox' feature, while others would say that it's actually a quite useful capability for a database to have and isn't just for being able to check a given box. I tended to lean towards the 'checkbox' camp and encouraged people to use filesystem encryption also, but there are use-cases where it'd be really nice to be able to have PG doing the encryption instead of the filesystem because then you can do things like backup the database, copy it somewhere else directly, and then restore it using the regular PG mechanisms, as long as you have access to the key. That's not something you can directly do with filesystem-level encryption (unless you happen to be lucky enough to be able to use ZFS, which can do exporting, or you can do a block-level exact copy to an exactly identically sized partition on the remote side or similar..), and while you could encrypt the PG files during the backup, that requires that you make sure both sides agree on how that encryption is done and have the same tools for performing the encryption/decryption. Possible, certainly, but not nearly as convenient. +1 for having this capability. Thanks! Stephen
On 6/13/17 09:24, Stephen Frost wrote: > but there are > use-cases where it'd be really nice to be able to have PG doing the > encryption instead of the filesystem because then you can do things like > backup the database, copy it somewhere else directly, and then restore > it using the regular PG mechanisms, as long as you have access to the > key. That's not something you can directly do with filesystem-level > encryption Interesting point. I wonder what the proper extent of "encryption at rest" should be. If you encrypt just on a file or block level, then someone looking at the data directory or a backup can still learn a number of things about the number of tables, transaction rates, various configuration settings, and so on. In the scenario of a sensitive application hosted on a shared SAN, I don't think that is good enough. Also, in the use case you describe, if you use pg_basebackup to make a direct encrypted copy of a data directory, I think that would mean you'd have to keep using the same key for all copies. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > I wonder what the proper extent of "encryption at rest" should be. If > you encrypt just on a file or block level, then someone looking at the > data directory or a backup can still learn a number of things about the > number of tables, transaction rates, various configuration settings, and > so on. In the scenario of a sensitive application hosted on a shared > SAN, I don't think that is good enough. If someone has access to the SAN, it'd be very difficult to avoid revealing some information about transaction rates or I/O throughput. Being able to have the configuration files encrypted would be good (thinking particularly about pg_hba.conf/pg_ident.conf) but I don't know that it's strictly necessary or that it would have to be done in an initial version. Certainly, there is a trade-off here when it comes to the information which someone can learn about the system by looking at the number and sizes of files from using PG-based encryption vs. what information someone can learn from being able to look at only an encrypted filesystem, but that's a trade-off which security experts are good at making a determination on and will be case-by-case, based on how easy setting up filesystem-encryption is in a particular environment and what the use-cases are for the system. > Also, in the use case you describe, if you use pg_basebackup to make a > direct encrypted copy of a data directory, I think that would mean you'd > have to keep using the same key for all copies. That's true, but that might be acceptable and possibly even desirable in certain cases. On the other hand, it would certainly be a useful feature to have a way to migrate from one key to another. Perhaps that would start out as an off-line tool, but maybe we'd be able to work out a way to support having it done on-line in the future (certainly non-trivial, but if we supported multiple keys concurrently with a preference for which key is used to write data back out, and required that checksums be in place to allow us to test if decrypting with a specific key worked ... lots more hand-waving here... ). Thanks! Stephen
On Mon, Jun 12, 2017 at 5:11 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > Fundamentally there doesn't seem to be a big benefit of implementing > the encryption at PostgreSQL level instead of the filesystem. The > patch doesn't take any real advantage from the higher level knowledge > of the system, nor do I see much possibility for it to do that. The > main benefit for us is that it's much easier to get a PostgreSQL based > solution deployed. I agree with all of that, but ease of deployment has some value unto itself. I think pretty much every modern operating system has some way of encrypting a filesystem, but it's different on Linux vs. Windows vs. macOS vs. BSD, and you probably need to be the system administrator on any of those systems in order to set it up. Something built into PostgreSQL could run without administrator privileges and work the same way on every platform we support. That would be useful. Of course, what would be even more useful is fine-grained encryption - encrypt these tables (and the corresponding indexes, toast tables, and WAL records related to any of that) with this key, encrypt these other tables (and the same list of associated stuff) with this other key, and leave the rest unencrypted. The problem with that is that you probably can't run recovery without all of the keys, and even on a clean startup there would be a good deal of engineering work involved in refusing access to tables whose key hadn't been provided yet. I don't think we should wait to have this feature until all of those problems are solved. In my opinion, something coarse-grained that just encrypts the whole cluster would be a pretty useful place to start and would meet the needs of enough people to be worthwhile all on its own. Performance is likely to be poor on large databases, because every time a page transits between shared_buffers and the buffer cache we've got to en/decrypt, but as long as it's only poor for the people who opt into the feature I don't see a big problem with that. I anticipate that one of the trickier problems here will be handling encryption of the write-ahead log. Suppose you encrypt WAL a block at a time. In the current system, once you've written and flushed a block, you can consider it durably committed, but if that block is encrypted, this is no longer true. A crash might tear the block, making it impossible to decrypt. Replay will therefore stop at the end of the previous block, not at the last record actually flushed as would happen today. So, your synchronous_commit suddenly isn't. A similar problem will occur any other page where we choose not to protect against torn pages using full page writes. For instance, unless checksums are enabled or wal_log_hints=on, we'll write a data page where a single bit has been flipped and assume that the bit will either make it to disk or not; the page can't really be torn in any way that hurts us. But with encryption that's no longer true, because the hint bit will turn into much more than a single bit flip, and rereading that page with half old and half new contents will be the end of the world (TM). I don't know off-hand whether we're protecting, say, CLOG page writes with FPWs.: because setting a couple of bits is idempotent and doesn't depend on the existing page contents, we might not need it currently, but with encryption, every bit in the page depends on every other bit in the page, so we certainly would. I don't know how many places we've got assumptions like this baked into the system, but I'm guessing there are a bunch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 13, 2017 at 11:35:03AM -0400, Robert Haas wrote: > I anticipate that one of the trickier problems here will be handling > encryption of the write-ahead log. Suppose you encrypt WAL a block at > a time. In the current system, once you've written and flushed a > block, you can consider it durably committed, but if that block is > encrypted, this is no longer true. A crash might tear the block, > making it impossible to decrypt. Replay will therefore stop at the > end of the previous block, not at the last record actually flushed as > would happen today. So, your synchronous_commit suddenly isn't. A > similar problem will occur any other page where we choose not to > protect against torn pages using full page writes. For instance, > unless checksums are enabled or wal_log_hints=on, we'll write a data > page where a single bit has been flipped and assume that the bit will > either make it to disk or not; the page can't really be torn in any > way that hurts us. But with encryption that's no longer true, because > the hint bit will turn into much more than a single bit flip, and > rereading that page with half old and half new contents will be the > end of the world (TM). I don't know off-hand whether we're > protecting, say, CLOG page writes with FPWs.: because setting a couple > of bits is idempotent and doesn't depend on the existing page > contents, we might not need it currently, but with encryption, every > bit in the page depends on every other bit in the page, so we > certainly would. I don't know how many places we've got assumptions > like this baked into the system, but I'm guessing there are a bunch. That is not necessary true. You are describing a cipher mode where the user data goes through the cipher, e.g. AES in CBC mode. However, if you are using a stream cipher based on a block cipher, e.g. CTR, GCM, you XOR the user data with a random bit stream, and in that case one bit change in user data would be one bit change in the cipher output. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote: > > Also, in the use case you describe, if you use pg_basebackup to make a > > direct encrypted copy of a data directory, I think that would mean you'd > > have to keep using the same key for all copies. > > That's true, but that might be acceptable and possibly even desirable in > certain cases. On the other hand, it would certainly be a useful > feature to have a way to migrate from one key to another. Perhaps that > would start out as an off-line tool, but maybe we'd be able to work out > a way to support having it done on-line in the future (certainly > non-trivial, but if we supported multiple keys concurrently with a > preference for which key is used to write data back out, and required > that checksums be in place to allow us to test if decrypting with a > specific key worked ... lots more hand-waving here... ). As I understand it, having encryption in the database means the key is stored in the database, while having encryption in the file system means the key is stored in the operating system somewhere. Of course, if the key stored in the database is visible to someone using the operating system, we really haven't added much/any security --- I guess my point is that the OS easily can hide the key from the database, but the database can't easily hide the key from the operating system. Of course, if the storage is split from the database server then having the key on the database server seems like a win. However, I think a db server could easily encrypt blocks before sending them to the SAN server. This would not work for NAS, of course, since it is file-based. I have to admit we tend to avoid heavy-API solutions that are designed just to work around deployment challenges. Commercial databases are fine in doing that, but it leads to very complex products. I think the larger issue is where to store the key. I would love for us to come up with a unified solution to that and then build encryption on that, including all-cluster encryption. One cool idea I have is using public encryption to store the encryption key by users who don't know the decryption key, e.g. RSA. It would be a write-only encryption option. Not sure how useful that is, but it easily possible, and doesn't require us to keep the _encryption_ key secret, just the decryption one. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote: > > > Also, in the use case you describe, if you use pg_basebackup to make a > > > direct encrypted copy of a data directory, I think that would mean you'd > > > have to keep using the same key for all copies. > > > > That's true, but that might be acceptable and possibly even desirable in > > certain cases. On the other hand, it would certainly be a useful > > feature to have a way to migrate from one key to another. Perhaps that > > would start out as an off-line tool, but maybe we'd be able to work out > > a way to support having it done on-line in the future (certainly > > non-trivial, but if we supported multiple keys concurrently with a > > preference for which key is used to write data back out, and required > > that checksums be in place to allow us to test if decrypting with a > > specific key worked ... lots more hand-waving here... ). > > As I understand it, having encryption in the database means the key is > stored in the database, while having encryption in the file system means > the key is stored in the operating system somewhere. Key management is an entirely independent discussion from this and the proposal from Ants, as I understand it, is that the key would *not* be in the database but could be anywhere that a shell command could get it from, including possibly a HSM (hardware device). Having the data encrypted by PostgreSQL does not mean the key is stored in the database. > Of course, if the > key stored in the database is visible to someone using the operating > system, we really haven't added much/any security --- I guess my point > is that the OS easily can hide the key from the database, but the > database can't easily hide the key from the operating system. This is correct- the key must be available to the PostgreSQL process and therefore someone with privileged access to the OS would be able to retrieve the key, but that's also true of filesystem encryption. Basically, if the server is doing the encryption and you have the ability to read all memory on the server then you can get the key. Of course, if you can read all memory then you can just look at shared buffers and you don't really need to bother yourself with the key or the encryption, and it doesn't make any difference if you're encrypting in the database or in the filesystem. That attack vector is not one which this is intending to address. > Of course, if the storage is split from the database server then having > the key on the database server seems like a win. However, I think a db > server could easily encrypt blocks before sending them to the SAN > server. This would not work for NAS, of course, since it is file-based. The key doesn't necessairly have to be stored anywhere on the server- it just needs to be kept in memory while the database process is running and made available to the database at startup, unless an external system is used to perform the encryption, which might be possible with an extension, as discussed. In some environments, it might be acceptable to have the key stored on the database server, of course, but there's no requirement for the key to be stored on the database server or in the database at all. > I have to admit we tend to avoid heavy-API solutions that are designed > just to work around deployment challenges. Commercial databases are > fine in doing that, but it leads to very complex products. I'm not following what you mean here. > I think the larger issue is where to store the key. I would love for us > to come up with a unified solution to that and then build encryption on > that, including all-cluster encryption. Honestly, key management is something that I'd rather we *not* worry about in an initial implementation, which is one reason that I liked the approach discussed here of having a command which runs to provide the key. We could certainly look into improving that in the future, but key management really is a largely independent issue from encryption and it's much more difficult and complicated and whatever we come up with would still almost certainly be usable with the approach proposed here. > One cool idea I have is using public encryption to store the encryption > key by users who don't know the decryption key, e.g. RSA. It would be a > write-only encryption option. Not sure how useful that is, but it > easily possible, and doesn't require us to keep the _encryption_ key > secret, just the decryption one. The downside here is that asymmetric encryption is much more expensive than symmetric encryption and that probably makes it a non-starter. I do think we'll want to support multiple encryption methods and perhaps we can have an option where asymmetric encryption is used, but that's not what I expect will be typically used. Thanks! Stephen
On Tue, Jun 13, 2017 at 12:23 PM, Stephen Frost <sfrost@snowman.net> wrote: > Key management is an entirely independent discussion from this and the > proposal from Ants, as I understand it, is that the key would *not* be > in the database but could be anywhere that a shell command could get it > from, including possibly a HSM (hardware device). Yes. I think the right way to implement this is something like: 1. Have a GUC that runs a shell command to get the key. 2. If the command successfully gets the key, it prints it to stdout and returns 0. 3. If it doesn't get successfully get the key, it returns 1. The database can retry or give up, whatever we decide to do. That way, if the user wants to store the key in an unencrypted text file, they can set the encryption_key_command = 'cat /not/very/secure' and call it a day. If they want to prompt the user on the console or request the key from an HSM or get it in any other way, they just have to write the appropriate shell script. We just provide mechanism, not policy, and the user can adopt any policy they like, from an extremely insecure policy to one suitable for Fort Knox. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 13, 2017 at 12:23:01PM -0400, Stephen Frost wrote: > > As I understand it, having encryption in the database means the key is > > stored in the database, while having encryption in the file system means > > the key is stored in the operating system somewhere. > > Key management is an entirely independent discussion from this and the > proposal from Ants, as I understand it, is that the key would *not* be > in the database but could be anywhere that a shell command could get it > from, including possibly a HSM (hardware device). > > Having the data encrypted by PostgreSQL does not mean the key is stored > in the database. Yes, I was just simplifying. > > Of course, if the > > key stored in the database is visible to someone using the operating > > system, we really haven't added much/any security --- I guess my point > > is that the OS easily can hide the key from the database, but the > > database can't easily hide the key from the operating system. > > This is correct- the key must be available to the PostgreSQL process > and therefore someone with privileged access to the OS would be able to > retrieve the key, but that's also true of filesystem encryption. > > Basically, if the server is doing the encryption and you have the > ability to read all memory on the server then you can get the key. Of > course, if you can read all memory then you can just look at shared > buffers and you don't really need to bother yourself with the key or > the encryption, and it doesn't make any difference if you're encrypting > in the database or in the filesystem. That attack vector is not one > which this is intending to address. My point is that if you have the key accessible to the database server, both the database server and OS have access to it. If you store it in the OS, only the OS has access to it. > > I have to admit we tend to avoid heavy-API solutions that are designed > > just to work around deployment challenges. Commercial databases are > > fine in doing that, but it leads to very complex products. > > I'm not following what you mean here. By adding all-cluster encryption, we are re-implementing something the OS does just fine, in most cases. We are going to have API overhead to do it in the database, and historically we have avoided that. > > One cool idea I have is using public encryption to store the encryption > > key by users who don't know the decryption key, e.g. RSA. It would be a > > write-only encryption option. Not sure how useful that is, but it > > easily possible, and doesn't require us to keep the _encryption_ key > > secret, just the decryption one. > > The downside here is that asymmetric encryption is much more expensive > than symmetric encryption and that probably makes it a non-starter. I > do think we'll want to support multiple encryption methods and perhaps > we can have an option where asymmetric encryption is used, but that's > not what I expect will be typically used. Well, usually the symetric key is stored using RSA and a symetric cipher is used to encrypt/decrypt the data. I was thinking of a case where you encrypt a row using a symetric key, then store RSA-encrypted versions of the symetric key encrypted that only specific users could decrypt and get the key to decrypt the data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 06/13/2017 09:28 AM, Robert Haas wrote: > On Tue, Jun 13, 2017 at 12:23 PM, Stephen Frost <sfrost@snowman.net> wrote: >> Key management is an entirely independent discussion from this and the >> proposal from Ants, as I understand it, is that the key would *not* be >> in the database but could be anywhere that a shell command could get it >> from, including possibly a HSM (hardware device). > > Yes. I think the right way to implement this is something like: > > 1. Have a GUC that runs a shell command to get the key. > > 2. If the command successfully gets the key, it prints it to stdout > and returns 0. > > 3. If it doesn't get successfully get the key, it returns 1. The > database can retry or give up, whatever we decide to do. > > That way, if the user wants to store the key in an unencrypted text > file, they can set the encryption_key_command = 'cat /not/very/secure' > and call it a day. If they want to prompt the user on the console or > request the key from an HSM or get it in any other way, they just have > to write the appropriate shell script. We just provide mechanism, not > policy, and the user can adopt any policy they like, from an extremely > insecure policy to one suitable for Fort Knox. Agreed, but as Bruce alluded to, we want this to be a master key, which is in turn used to encrypt the actual key, or keys, that are used to encrypt the data. The actual data encryption keys could be very long randomly generated binary, and there could be more than one of them (e.g. one per tablespace) in a file which is encrypted with the master key. This is more secure and allows, for example the master key to be changed without having to decrypt/re-encrypt the entire database. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote: > > That way, if the user wants to store the key in an unencrypted text > > file, they can set the encryption_key_command = 'cat /not/very/secure' > > and call it a day. If they want to prompt the user on the console or > > request the key from an HSM or get it in any other way, they just have > > to write the appropriate shell script. We just provide mechanism, not > > policy, and the user can adopt any policy they like, from an extremely > > insecure policy to one suitable for Fort Knox. > > Agreed, but as Bruce alluded to, we want this to be a master key, which > is in turn used to encrypt the actual key, or keys, that are used to > encrypt the data. The actual data encryption keys could be very long > randomly generated binary, and there could be more than one of them > (e.g. one per tablespace) in a file which is encrypted with the master > key. This is more secure and allows, for example the master key to be > changed without having to decrypt/re-encrypt the entire database. Yes, thank you. Also, you can make multiple RSA-encrypted copies of the symetric key, one for each role you want to view the data. And good point on the ability to change the RSA key/password without having to reencrypt the data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 12:23:01PM -0400, Stephen Frost wrote: > > > Of course, if the > > > key stored in the database is visible to someone using the operating > > > system, we really haven't added much/any security --- I guess my point > > > is that the OS easily can hide the key from the database, but the > > > database can't easily hide the key from the operating system. > > > > This is correct- the key must be available to the PostgreSQL process > > and therefore someone with privileged access to the OS would be able to > > retrieve the key, but that's also true of filesystem encryption. > > > > Basically, if the server is doing the encryption and you have the > > ability to read all memory on the server then you can get the key. Of > > course, if you can read all memory then you can just look at shared > > buffers and you don't really need to bother yourself with the key or > > the encryption, and it doesn't make any difference if you're encrypting > > in the database or in the filesystem. That attack vector is not one > > which this is intending to address. > > My point is that if you have the key accessible to the database server, > both the database server and OS have access to it. If you store it in > the OS, only the OS has access to it. I understand that, but that's not a particularly interesting distinction as the database server must, necessairly, have access to the data in the database. > > > I have to admit we tend to avoid heavy-API solutions that are designed > > > just to work around deployment challenges. Commercial databases are > > > fine in doing that, but it leads to very complex products. > > > > I'm not following what you mean here. > > By adding all-cluster encryption, we are re-implementing something the > OS does just fine, in most cases. We are going to have API overhead to > do it in the database, and historically we have avoided that. We do try to avoid the overhead of additional function calls, in places where it matters. As this is all about reading and writing data, the overhead for the additional check to see if we're doing encryption or not is unlikely to be interesting. > > > One cool idea I have is using public encryption to store the encryption > > > key by users who don't know the decryption key, e.g. RSA. It would be a > > > write-only encryption option. Not sure how useful that is, but it > > > easily possible, and doesn't require us to keep the _encryption_ key > > > secret, just the decryption one. > > > > The downside here is that asymmetric encryption is much more expensive > > than symmetric encryption and that probably makes it a non-starter. I > > do think we'll want to support multiple encryption methods and perhaps > > we can have an option where asymmetric encryption is used, but that's > > not what I expect will be typically used. > > Well, usually the symetric key is stored using RSA and a symetric > cipher is used to encrypt/decrypt the data. I was thinking of a case > where you encrypt a row using a symetric key, then store RSA-encrypted > versions of the symetric key encrypted that only specific users could > decrypt and get the key to decrypt the data. This goes back to key management and I agree that it often makes sense to use RSA or similar to encrypt the symmetric key, and this approach would allow the user to do so. That doesn't actually give you a "write-only" encryption option though, since any user who can decrypt the symmetric key is able to use the symmetric key for both encryption and decryption, and someone who only has access to the RSA encryption key can't actually encrypt the data since they can't access the symmetric key. Thanks! Stephen
Bruce, Joe, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote: > > > That way, if the user wants to store the key in an unencrypted text > > > file, they can set the encryption_key_command = 'cat /not/very/secure' > > > and call it a day. If they want to prompt the user on the console or > > > request the key from an HSM or get it in any other way, they just have > > > to write the appropriate shell script. We just provide mechanism, not > > > policy, and the user can adopt any policy they like, from an extremely > > > insecure policy to one suitable for Fort Knox. > > > > Agreed, but as Bruce alluded to, we want this to be a master key, which > > is in turn used to encrypt the actual key, or keys, that are used to > > encrypt the data. The actual data encryption keys could be very long > > randomly generated binary, and there could be more than one of them > > (e.g. one per tablespace) in a file which is encrypted with the master > > key. This is more secure and allows, for example the master key to be > > changed without having to decrypt/re-encrypt the entire database. > > Yes, thank you. Also, you can make multiple RSA-encrypted copies of the > symetric key, one for each role you want to view the data. And good > point on the ability to change the RSA key/password without having to > reencrypt the data. There's nothing in this proposal that prevents the user from using a very long randomly generated binary key. We aren't talking about prompting the user for a password unless that's what they decide the shell script should do, unless the user decides to do that and if they do then that's their choice. Let us, please, stop stressing over the right way to do key management as part of this discussion about providing encryption. The two are different things and we do not need to solve both at once. Further, yes, we will definitely want to get to a point where we can encrypt subsets of the system in different ways, but that doesn't have to be done in the first implementation either. Thanks! Stephen
On 06/13/2017 10:05 AM, Stephen Frost wrote: > Bruce, Joe, > > * Bruce Momjian (bruce@momjian.us) wrote: >> On Tue, Jun 13, 2017 at 09:55:10AM -0700, Joe Conway wrote: >> > > That way, if the user wants to store the key in an unencrypted text >> > > file, they can set the encryption_key_command = 'cat /not/very/secure' >> > > and call it a day. If they want to prompt the user on the console or >> > > request the key from an HSM or get it in any other way, they just have >> > > to write the appropriate shell script. We just provide mechanism, not >> > > policy, and the user can adopt any policy they like, from an extremely >> > > insecure policy to one suitable for Fort Knox. >> > >> > Agreed, but as Bruce alluded to, we want this to be a master key, which >> > is in turn used to encrypt the actual key, or keys, that are used to >> > encrypt the data. The actual data encryption keys could be very long >> > randomly generated binary, and there could be more than one of them >> > (e.g. one per tablespace) in a file which is encrypted with the master >> > key. This is more secure and allows, for example the master key to be >> > changed without having to decrypt/re-encrypt the entire database. >> >> Yes, thank you. Also, you can make multiple RSA-encrypted copies of the >> symetric key, one for each role you want to view the data. And good >> point on the ability to change the RSA key/password without having to >> reencrypt the data. > > There's nothing in this proposal that prevents the user from using a > very long randomly generated binary key. We aren't talking about > prompting the user for a password unless that's what they decide the > shell script should do, unless the user decides to do that and if they > do then that's their choice. Except shell escaping issues, etc, etc > Let us, please, stop stressing over the right way to do key management > as part of this discussion about providing encryption. The two are > different things and we do not need to solve both at once. Not stressing, but this is an important part of the design and should be done correctly. It is also very simple, so should not be hard to add. > Further, yes, we will definitely want to get to a point where we can > encrypt subsets of the system in different ways, but that doesn't have > to be done in the first implementation either. No, it doesn't, but that doesn't change the utility of doing it this way from the start. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On Tue, Jun 13, 2017 at 01:01:32PM -0400, Stephen Frost wrote: > > Well, usually the symetric key is stored using RSA and a symetric > > cipher is used to encrypt/decrypt the data. I was thinking of a case > > where you encrypt a row using a symetric key, then store RSA-encrypted > > versions of the symetric key encrypted that only specific users could > > decrypt and get the key to decrypt the data. > > This goes back to key management and I agree that it often makes sense > to use RSA or similar to encrypt the symmetric key, and this approach > would allow the user to do so. That doesn't actually give you a > "write-only" encryption option though, since any user who can decrypt > the symmetric key is able to use the symmetric key for both encryption > and decryption, and someone who only has access to the RSA encryption > key can't actually encrypt the data since they can't access the > symmetric key. I think the big win of Postgres doing the encryption is that the user-visible file system is no longer a target (assuming OS permissions are bypassed), while for file system encryption it is the storage device that is encrypted. My big question is how many times are the OS permissions bypassed in a way that would also not expose the db clusters key or db data? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Joe, * Joe Conway (mail@joeconway.com) wrote: > Except shell escaping issues, etc, etc That's not an issue- we're talking about reading the stdout of some other process, there's no shell escaping that has to be done there. > > Let us, please, stop stressing over the right way to do key management > > as part of this discussion about providing encryption. The two are > > different things and we do not need to solve both at once. > > Not stressing, but this is an important part of the design and should be > done correctly. It is also very simple, so should not be hard to add. I disagree that proper key management is "simple". If we really get to a point where we think we have a simple answer to it then perhaps that can be implemented in addition to the encryption piece in the same release cycle- but they certainly don't need to be in the same patch, nor do we need to make good key management a requirement for adding encryption support. > > Further, yes, we will definitely want to get to a point where we can > > encrypt subsets of the system in different ways, but that doesn't have > > to be done in the first implementation either. > > No, it doesn't, but that doesn't change the utility of doing it this way > from the start. No, but it seriously changes the level of complexity. I feel like we're trying to go from zero to light speed here because there's an idea that it's "simple" to add X, Y or Z additional requirement beyond the basic feature, but we don't have anything yet. I continue to be of the feeling that we should start simple and keep it to the basic feature first and make sure that we can actually get that right before we start looking into adding on additional bits. Thanks! Stephen
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 01:01:32PM -0400, Stephen Frost wrote: > > > Well, usually the symetric key is stored using RSA and a symetric > > > cipher is used to encrypt/decrypt the data. I was thinking of a case > > > where you encrypt a row using a symetric key, then store RSA-encrypted > > > versions of the symetric key encrypted that only specific users could > > > decrypt and get the key to decrypt the data. > > > > This goes back to key management and I agree that it often makes sense > > to use RSA or similar to encrypt the symmetric key, and this approach > > would allow the user to do so. That doesn't actually give you a > > "write-only" encryption option though, since any user who can decrypt > > the symmetric key is able to use the symmetric key for both encryption > > and decryption, and someone who only has access to the RSA encryption > > key can't actually encrypt the data since they can't access the > > symmetric key. > > I think the big win of Postgres doing the encryption is that the > user-visible file system is no longer a target (assuming OS permissions > are bypassed), while for file system encryption it is the storage device > that is encrypted. If OS permissions are bypassed then the encryption isn't going to help because the attacker can just access shared memory. The big wins for doing the encryption in PostgreSQL are, as Robert and I have both mentioned on this thread already, that it provides data-at-rest encryption in an easier to deploy fashion which will work the same across different systems and allows the encrypted cluster to be transferred more easily between systems. There are almsot certainly other wins from having PG do the encryption, but the above strikes me as the big ones, and those are certainly valuable enough on their own for us to seriously consider adding this capability. > My big question is how many times are the OS permissions bypassed in a > way that would also not expose the db clusters key or db data? This is not the attack vector that this solution is attempting to address, so there really isn't much point in discussing it on this thread. Thanks! Stephen
On 06/13/2017 10:20 AM, Stephen Frost wrote: > * Joe Conway (mail@joeconway.com) wrote: >> Except shell escaping issues, etc, etc > > That's not an issue- we're talking about reading the stdout of some > other process, there's no shell escaping that has to be done there. It could be an issue depending on how the user stores their master key. > I disagree that proper key management is "simple". If we really get to > a point where we think we have a simple answer to it then perhaps that > can be implemented in addition to the encryption piece in the same > release cycle- but they certainly don't need to be in the same patch, > nor do we need to make good key management a requirement for adding > encryption support. I never said key management was simple. Indeed it is the most complex and hazardous part of all this as you said earlier. What is simple is implementing a master key encrypting actual keys scheme. Keeping the user's master key management out of this design is unchanged by what I proposed, and what I proposed is a superior yet simple method. Yes, it can be done separately but what is the point? We should at least discuss it as part of the design. > No, but it seriously changes the level of complexity. I feel like we're > trying to go from zero to light speed here because there's an idea that > it's "simple" to add X, Y or Z additional requirement beyond the basic > feature, but we don't have anything yet. I think that is hyperbole. It does not significantly add to the complexity of what is being discussed. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On Tue, Jun 13, 2017 at 01:25:00PM -0400, Stephen Frost wrote: > > I think the big win of Postgres doing the encryption is that the > > user-visible file system is no longer a target (assuming OS permissions > > are bypassed), while for file system encryption it is the storage device > > that is encrypted. > > If OS permissions are bypassed then the encryption isn't going to help > because the attacker can just access shared memory. > > The big wins for doing the encryption in PostgreSQL are, as Robert and I > have both mentioned on this thread already, that it provides > data-at-rest encryption in an easier to deploy fashion which will work > the same across different systems and allows the encrypted cluster to be > transferred more easily between systems. There are almsot certainly > other wins from having PG do the encryption, but the above strikes me as > the big ones, and those are certainly valuable enough on their own for > us to seriously consider adding this capability. Since you seem to be trying to shut down discussion, I will simply say I am unimpressed that this use-case is sufficient justification to add the feature. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Joe, * Joe Conway (mail@joeconway.com) wrote: > On 06/13/2017 10:20 AM, Stephen Frost wrote: > > * Joe Conway (mail@joeconway.com) wrote: > >> Except shell escaping issues, etc, etc > > > > That's not an issue- we're talking about reading the stdout of some > > other process, there's no shell escaping that has to be done there. > > It could be an issue depending on how the user stores their master key. ... eh? The user gives us a command to run, we run it, it spits out some binary blob to stdout which we read in and use as the key. I don't see where in that there's any need to be concerned about shell escaping issues. > > I disagree that proper key management is "simple". If we really get to > > a point where we think we have a simple answer to it then perhaps that > > can be implemented in addition to the encryption piece in the same > > release cycle- but they certainly don't need to be in the same patch, > > nor do we need to make good key management a requirement for adding > > encryption support. > > I never said key management was simple. Indeed it is the most complex > and hazardous part of all this as you said earlier. What is simple is > implementing a master key encrypting actual keys scheme. Keeping the > user's master key management out of this design is unchanged by what I > proposed, and what I proposed is a superior yet simple method. Yes, it > can be done separately but what is the point? We should at least discuss > it as part of the design. The point is that we haven't got any encryption of any kind and you're suggesting we introduce key management which you agree isn't simple. That you're trying to argue that it actually is simple because it's just <<description of a key management system>> is a bit bizarre to me. > > No, but it seriously changes the level of complexity. I feel like we're > > trying to go from zero to light speed here because there's an idea that > > it's "simple" to add X, Y or Z additional requirement beyond the basic > > feature, but we don't have anything yet. > > I think that is hyperbole. It does not significantly add to the > complexity of what is being discussed. If I stipulate that it's, indeed, simple to implement a system where we have a master key and other keys- where are those other keys going to kept (even if they're encrypted)? How many extra keys are we talking about? When are those keys going to be used and how do we know what key to use when? If we're going to do per-tablespace or per-table encryption, how are we going to handle the WAL for that? Will we have an independent key for WAL (in which case, what's the point of using different keys for tables, et al, when all the data is in the WAL?)? Having a single key which is used cluster-wide is extremely simple and lets us get some form of encryption first before we try to tackle the more complicated multi-key/partial-encryption system. Just to be clear, I don't have any issue with discussing the idea that we want to get to a point where we can work with multiple keys and encrypt different tables with different keys (or not encrypt certain tables, et al) with the goal of implementing the single-key approach in a way that allows us to expand on it down the road easily, I just don't think we need to have it all done in the very first patch which adds the ability to encrypt the data files. Maybe you're not saying that it has to be included in the first implementation, in which case we seem to just be talking past each other, but that isn't the impression I got.. Thanks! Stephen
On Tue, Jun 13, 2017 at 01:44:51PM -0400, Stephen Frost wrote: > Just to be clear, I don't have any issue with discussing the idea that > we want to get to a point where we can work with multiple keys and > encrypt different tables with different keys (or not encrypt certain > tables, et al) with the goal of implementing the single-key approach in > a way that allows us to expand on it down the road easily, I just don't > think we need to have it all done in the very first patch which adds the > ability to encrypt the data files. Maybe you're not saying that it has > to be included in the first implementation, in which case we seem to > just be talking past each other, but that isn't the impression I got.. We don't want to implement all-cluster encryption with a simple user API and then realize we need another API for later encryption features, do we? And we are not going to know that if we don't talk about it, but hey, this is just an email thread and I can marshal opposition to the feature later when it appears, and point this all out again. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 01:44:51PM -0400, Stephen Frost wrote: > > Just to be clear, I don't have any issue with discussing the idea that > > we want to get to a point where we can work with multiple keys and > > encrypt different tables with different keys (or not encrypt certain > > tables, et al) with the goal of implementing the single-key approach in > > a way that allows us to expand on it down the road easily, I just don't > > think we need to have it all done in the very first patch which adds the > > ability to encrypt the data files. Maybe you're not saying that it has > > to be included in the first implementation, in which case we seem to > > just be talking past each other, but that isn't the impression I got.. > > We don't want to implement all-cluster encryption with a simple user API > and then realize we need another API for later encryption features, do > we? I actually suspect that's exactly where we'll end up- but due to necessity rather than because there's a way to avoid it. We are going to want to encrypt cluster-wide components of the system (shared catalogs, WAL, etc) and that means that we have to have a key provided very early on. That's a very different thing from being able to do something like encrypt specific tables, tablespaces, etc, where the key can be provided much later and we'll want to allow users to use SQL or the libpq protocol to be able to specify what to encrypt and possibly even provide the encryption keys. That said, the approach outlined here could be used for both by expanding on the command string, perhaps passing it a keyid which is what we store in the catalog to indicate what key a table is encrypted with and then the keyid is "%k" in the command string and the command has to return the specified key for us to decrypt the table. That would involve adding a new catalog table to identify keys and their keyids, I'd think, and an additional column in pg_class which specifies the key (or perhaps we'd just have a new catalog table that says what tables are encrypted in what way). Thanks! Stephen
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 01:25:00PM -0400, Stephen Frost wrote: > > > I think the big win of Postgres doing the encryption is that the > > > user-visible file system is no longer a target (assuming OS permissions > > > are bypassed), while for file system encryption it is the storage device > > > that is encrypted. > > > > If OS permissions are bypassed then the encryption isn't going to help > > because the attacker can just access shared memory. > > > > The big wins for doing the encryption in PostgreSQL are, as Robert and I > > have both mentioned on this thread already, that it provides > > data-at-rest encryption in an easier to deploy fashion which will work > > the same across different systems and allows the encrypted cluster to be > > transferred more easily between systems. There are almsot certainly > > other wins from having PG do the encryption, but the above strikes me as > > the big ones, and those are certainly valuable enough on their own for > > us to seriously consider adding this capability. > > Since you seem to be trying to shut down discussion, I will simply say I > am unimpressed that this use-case is sufficient justification to add the > feature. I'm not trying to shut down discussion, I'm simply pointing out where this feature will be helpful and where it won't be. If there's a way to make it better and able to address an attack where the OS permission system is bypassed, that'd be great, but I certainly don't know of any way to do that and we don't want to claim that this feature will protect against an attack vector that it won't. If the lack of that means you don't support the feature, that's unfortunate as it seems to imply, to me at least, that we'll never have any kind of encryption because there's no way for it to prevent attacks where the OS permission system is able to be bypassed. Thanks! Stephen
On Tue, Jun 13, 2017 at 02:23:39PM -0400, Stephen Frost wrote: > I'm not trying to shut down discussion, I'm simply pointing out where > this feature will be helpful and where it won't be. If there's a way to > make it better and able to address an attack where the OS permission > system is bypassed, that'd be great, but I certainly don't know of any > way to do that and we don't want to claim that this feature will protect > against an attack vector that it won't. > > If the lack of that means you don't support the feature, that's > unfortunate as it seems to imply, to me at least, that we'll never have > any kind of encryption because there's no way for it to prevent attacks > where the OS permission system is able to be bypassed. It means if we can't discuss the actual benefits that this feature brings, and doesn't bring, and how it will deal with future feature additions, then you are right we will never have it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 02:23:39PM -0400, Stephen Frost wrote: > > I'm not trying to shut down discussion, I'm simply pointing out where > > this feature will be helpful and where it won't be. If there's a way to > > make it better and able to address an attack where the OS permission > > system is bypassed, that'd be great, but I certainly don't know of any > > way to do that and we don't want to claim that this feature will protect > > against an attack vector that it won't. > > > > If the lack of that means you don't support the feature, that's > > unfortunate as it seems to imply, to me at least, that we'll never have > > any kind of encryption because there's no way for it to prevent attacks > > where the OS permission system is able to be bypassed. > > It means if we can't discuss the actual benefits that this feature > brings, and doesn't bring, and how it will deal with future feature > additions, then you are right we will never have it. I apologize for having come across as trying to shut down discussion, that was not my intent. It's good to discuss what the feature would bring and what cases it doesn't cover, as well as discussing how it can be designed to make sure that later improvements are able to be done without having to change it around. I do think it's a good idea for us to consider taking an incremental approach where we're adding pieces and building things up as we go. I'm concerned that if we try to do too much in the initial implementation that we'll end up not having anything. As it relates to the different attack vectors that this would address, it's primairly the same ones which filesystem-level encryption also addresses, but it's an improvement when it comes to ease of use. Unfortunately, it won't address cases where the OS is compromised. Thanks! Stephen
On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote: > It's good to discuss what the feature would bring and what cases it > doesn't cover, as well as discussing how it can be designed to make sure > that later improvements are able to be done without having to change it > around. I do think it's a good idea for us to consider taking an > incremental approach where we're adding pieces and building things up as > we go. I'm concerned that if we try to do too much in the initial > implementation that we'll end up not having anything. > > As it relates to the different attack vectors that this would address, > it's primairly the same ones which filesystem-level encryption also > addresses, but it's an improvement when it comes to ease of use. > Unfortunately, it won't address cases where the OS is compromised. OK, so let's go back. You are saying there are no security benefits to this vs. file system encryption. The benefit is allowing configuration in the database rather than the OS? You stated you can transfer db-level encrypted files between servers, but can't you do that anyway? Is the problem that you have to encrypt before sending and decrypt on arrival, if you don't trust the transmission link? Is that used a lot? Is having the db encrypt every write a reasonable solution to that? As far as future features, we don't have to add the all features at this time, but if someone has a good idea for an API and we can make it work easily while adding this feature, why wouldn't we do that? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote: > > It's good to discuss what the feature would bring and what cases it > > doesn't cover, as well as discussing how it can be designed to make sure > > that later improvements are able to be done without having to change it > > around. I do think it's a good idea for us to consider taking an > > incremental approach where we're adding pieces and building things up as > > we go. I'm concerned that if we try to do too much in the initial > > implementation that we'll end up not having anything. > > > > As it relates to the different attack vectors that this would address, > > it's primairly the same ones which filesystem-level encryption also > > addresses, but it's an improvement when it comes to ease of use. > > Unfortunately, it won't address cases where the OS is compromised. > > OK, so let's go back. You are saying there are no security benefits to > this vs. file system encryption. I'm not sure that I can see any, myself.. Perhaps I'm wrong there, but it seems unlikely that this would be an improvement over filesystem level encryption in the general sense. I'm not sure that I see it as really worse than filesystem-level encryption either, to be clear. There's a bit of increased information leakage, as Peter mentioned and I agreed with, but it's not a lot and I expect in most cases that information leak would be acceptable. That seems like something which would need to be considered on a case-by-case basis. > The benefit is allowing configuration > in the database rather than the OS? No, the benefit is that the database administrator can configure it and set it up and not have to get an OS-level administrator involved. There may also be other reasons why filesystem-level encryption is difficult to set up or use in a certain environment, but this wouldn't depend on anything OS-related and therefore could be done. > You stated you can transfer > db-level encrypted files between servers, but can't you do that anyway? If the filesystem is encrypted and you wanted to transfer the entire cluster from one system to another, keeping it encrypted with the same key, you'd have to transfer the entire filesystem at a block level. That's not typically very easy to do (ZFS, specifically, has this capability where you can export a filesystem and send it from one machine to another, but I don't know of any other filesystems which do and ZFS isn't always an option..). You could go through a process of re-encrypting the files prior to transferring them, or deciding that simply having the transport mechanism encrypted is sufficient (eg: SSH), but if what you really want to do is keep the existing encryption of the database and transfer it to another system, this allows that pretty easily. For example, you could simply do: cp -a /path/to/PG /mnt/usb and you're done. If you're using filesystem level encryption then you'd have to re-encrypt the data, using something like: tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar And then you would need openssl on the other system to decrypt it. Of course, either way you'd have to provide for a way to get the key from one system to the other. Also, tools like pg_basebackup could be used, with nearly zero changes, I think, to get an encrypted copy of the database for backup purposes, removing the need to work out a way to handle encrypting backups. > Is the problem that you have to encrypt before sending and decrypt on > arrival, if you don't trust the transmission link? Is that used a lot? > Is having the db encrypt every write a reasonable solution to that? There's multiple use-cases here. Making it easier to copy the database is just one of them and it isn't the biggest one. The biggest benefit is that there's cases where filesystem-level encryption isn't really an option or, even if it is, it's not desirable for other reasons. > As far as future features, we don't have to add the all features at this > time, but if someone has a good idea for an API and we can make it work > easily while adding this feature, why wouldn't we do that? Sure, I'm all for specific suggestions about how to improve the API, or just in general recommendations of how to improve the patch. The suggestions which have been made about key management don't really come across to me as specific API-level recommendations but rather "this would also be nice to have" kind of comments, which isn't really the same. Thanks! Stephen
On 6/13/17 15:20, Stephen Frost wrote: > No, the benefit is that the database administrator can configure it and > set it up and not have to get an OS-level administrator involved. There > may also be other reasons why filesystem-level encryption is difficult > to set up or use in a certain environment, but this wouldn't depend on > anything OS-related and therefore could be done. Let's see a proposal in those terms then. How easy can you make it, compared to existing OS-level solutions, and will that justify the maintenance overhead? Considering how ubiquitous file-system encryption is, I have my doubts that the trade-offs will come out right, but let's see. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/13/17 15:20, Stephen Frost wrote: > For example, you could simply do: > > cp -a /path/to/PG /mnt/usb > > and you're done. If you're using filesystem level encryption then you'd > have to re-encrypt the data, using something like: > > tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar > > And then you would need openssl on the other system to decrypt it. Or make the USB file system encrypted as well? If you're in that kind of environment, that would surely be feasible, if not required. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 13, 2017 at 03:20:12PM -0400, Stephen Frost wrote: > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Tue, Jun 13, 2017 at 02:38:58PM -0400, Stephen Frost wrote: > > > It's good to discuss what the feature would bring and what cases it > > > doesn't cover, as well as discussing how it can be designed to make sure > > > that later improvements are able to be done without having to change it > > > around. I do think it's a good idea for us to consider taking an > > > incremental approach where we're adding pieces and building things up as > > > we go. I'm concerned that if we try to do too much in the initial > > > implementation that we'll end up not having anything. > > > > > > As it relates to the different attack vectors that this would address, > > > it's primairly the same ones which filesystem-level encryption also > > > addresses, but it's an improvement when it comes to ease of use. > > > Unfortunately, it won't address cases where the OS is compromised. > > > > OK, so let's go back. You are saying there are no security benefits to > > this vs. file system encryption. > > I'm not sure that I can see any, myself.. Perhaps I'm wrong there, but > it seems unlikely that this would be an improvement over filesystem > level encryption in the general sense. I'm not sure that I see it as > really worse than filesystem-level encryption either, to be clear. > There's a bit of increased information leakage, as Peter mentioned and I > agreed with, but it's not a lot and I expect in most cases that > information leak would be acceptable. That seems like something which > would need to be considered on a case-by-case basis. Isn't the leakage controlled by OS permissions, so is it really leakage, i.e., if you can see the leakage, you probably have bypassed the OS permissions and see the key and data anyway. > > The benefit is allowing configuration > > in the database rather than the OS? > > No, the benefit is that the database administrator can configure it and > set it up and not have to get an OS-level administrator involved. There > may also be other reasons why filesystem-level encryption is difficult > to set up or use in a certain environment, but this wouldn't depend on > anything OS-related and therefore could be done. OK, my only point here is that we are going down a slippery slope of implementing OS things in the database. There is nothing wrong with that but it has often been something we have avoided, because of the added complexity required in the db server. As a counter-example, we only added an external collation library to Postgres when we clearly saw a benefit, e.g. detecting collation changes. > > You stated you can transfer > > db-level encrypted files between servers, but can't you do that anyway? > > If the filesystem is encrypted and you wanted to transfer the entire > cluster from one system to another, keeping it encrypted with the same > key, you'd have to transfer the entire filesystem at a block level. > That's not typically very easy to do (ZFS, specifically, has this > capability where you can export a filesystem and send it from one > machine to another, but I don't know of any other filesystems which do > and ZFS isn't always an option..). > > You could go through a process of re-encrypting the files prior to > transferring them, or deciding that simply having the transport > mechanism encrypted is sufficient (eg: SSH), but if what you really want > to do is keep the existing encryption of the database and transfer it to > another system, this allows that pretty easily. > > For example, you could simply do: > > cp -a /path/to/PG /mnt/usb > > and you're done. If you're using filesystem level encryption then you'd > have to re-encrypt the data, using something like: > > tar -cf - /path/to/PG | openssl -key private.key > /mnt/usb/encrypted_cluster.tar > > And then you would need openssl on the other system to decrypt it. > > Of course, either way you'd have to provide for a way to get the key > from one system to the other. Uh, doesn't scp do this? I have trouble seeing how avoiding calling openssl justifies changes to our database server. > Also, tools like pg_basebackup could be used, with nearly zero changes, > I think, to get an encrypted copy of the database for backup purposes, > removing the need to work out a way to handle encrypting backups. I do think we need much more documentation on how to encrypt things, though that is a separate issue. It might help to document how you _should_ do things now to see the limitations we have. > > Is the problem that you have to encrypt before sending and decrypt on > > arrival, if you don't trust the transmission link? Is that used a lot? > > Is having the db encrypt every write a reasonable solution to that? > > There's multiple use-cases here. Making it easier to copy the database > is just one of them and it isn't the biggest one. The biggest benefit > is that there's cases where filesystem-level encryption isn't really an > option or, even if it is, it's not desirable for other reasons. I am thinking NAS storage you don't trust, though there is the leakage there. > > As far as future features, we don't have to add the all features at this > > time, but if someone has a good idea for an API and we can make it work > > easily while adding this feature, why wouldn't we do that? > > Sure, I'm all for specific suggestions about how to improve the API, or > just in general recommendations of how to improve the patch. The > suggestions which have been made about key management don't really come > across to me as specific API-level recommendations but rather "this > would also be nice to have" kind of comments, which isn't really the > same. They are related, or you can't even know that because you are/were preventing the discussion from happening. As an example, full-cluster encryption doesn't really add any new capabilities, maybe just easier deployment for some users. Where things get useful is fine-grained encryption, which file system-level encryption can't do. Also, has anyone asked users if they would find db-encryption better than file system encryption? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 6/13/17 15:51, Bruce Momjian wrote: > Isn't the leakage controlled by OS permissions, so is it really leakage, > i.e., if you can see the leakage, you probably have bypassed the OS > permissions and see the key and data anyway. One scenario (among many) is when you're done with the disk. If the content was fully encrypted, then you can just throw it into the trash or have your provider dispose of it or reuse it. If not, then, depending on policy, you will have to physically obtain it and burn it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 13, 2017 at 04:08:29PM -0400, Peter Eisentraut wrote: > On 6/13/17 15:51, Bruce Momjian wrote: > > Isn't the leakage controlled by OS permissions, so is it really leakage, > > i.e., if you can see the leakage, you probably have bypassed the OS > > permissions and see the key and data anyway. > > One scenario (among many) is when you're done with the disk. If the > content was fully encrypted, then you can just throw it into the trash > or have your provider dispose of it or reuse it. If not, then, > depending on policy, you will have to physically obtain it and burn it. Oh, I see your point --- db-level encryption stores the file system as mountable on the device, while it is not with storage-level encryption --- got it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Jun 13, 2017 at 10:28:14AM -0400, Peter Eisentraut wrote: > On 6/13/17 09:24, Stephen Frost wrote: > > but there are use-cases where it'd be really nice to be able to > > have PG doing the encryption instead of the filesystem because > > then you can do things like backup the database, copy it somewhere > > else directly, and then restore it using the regular PG > > mechanisms, as long as you have access to the key. That's not > > something you can directly do with filesystem-level encryption > > Interesting point. > > I wonder what the proper extent of "encryption at rest" should be. > If you encrypt just on a file or block level, then someone looking > at the data directory or a backup can still learn a number of things > about the number of tables, transaction rates, various configuration > settings, and so on. In the end, information leaks at a strictly positive baud rate because physics (cf. Claude Shannon, et al). Encryption at rest is one technique whereby people can slow this rate, but there's no such thing as getting it to zero. Let's not creep this feature in the ultimately futile attempt to do so. > In the scenario of a sensitive application hosted on a shared > SAN, I don't think that is good enough. > > Also, in the use case you describe, if you use pg_basebackup to make a > direct encrypted copy of a data directory, I think that would mean you'd > have to keep using the same key for all copies. Right. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Peter, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 6/13/17 15:20, Stephen Frost wrote: > > No, the benefit is that the database administrator can configure it and > > set it up and not have to get an OS-level administrator involved. There > > may also be other reasons why filesystem-level encryption is difficult > > to set up or use in a certain environment, but this wouldn't depend on > > anything OS-related and therefore could be done. > > Let's see a proposal in those terms then. How easy can you make it, > compared to existing OS-level solutions, and will that justify the > maintenance overhead? From the original post on this thread, which included a WIP patch: ---------------------------------- Usage ===== Set up database like so: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY initdb -k -K pgcrypto $PGDATA ) Start PostgreSQL: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY postgres $PGDATA ) ---------------------------------- That certainly seems very straight-forward to me, though I expect that packagers would probably improve upon it further. > Considering how ubiquitous file-system encryption is, I have my doubts > that the trade-offs will come out right, but let's see. There's definitely environments out there where DBAs aren't able to have root privileges and that limits what they're able to do. I'm not really sure how to objectively weigh "you don't need to be root to encrypt the database" vs. maintenance overhead of this feature. Subjectively, for my 2c anyway, it seems well worth it, but that's naturally subjective. :) Thanks! Stephen
Peter, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 6/13/17 15:20, Stephen Frost wrote: > > And then you would need openssl on the other system to decrypt it. > > Or make the USB file system encrypted as well? If you're in that kind > of environment, that would surely be feasible, if not required. Right, but requiring file system encryption to work on a USB stick across different types of systems strikes me as actually a higher bar than requiring openssl to exist on both the source and destination sides. Naturally, if the environment you're in has already solved that problem across the enterprise then it's a good approach, although you might want to use a different encryption key, perhaps, though hopefully that's something you'd be able to do pretty easily too. Thanks! Stephen
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Jun 13, 2017 at 03:20:12PM -0400, Stephen Frost wrote: > > > OK, so let's go back. You are saying there are no security benefits to > > > this vs. file system encryption. > > > > I'm not sure that I can see any, myself.. Perhaps I'm wrong there, but > > it seems unlikely that this would be an improvement over filesystem > > level encryption in the general sense. I'm not sure that I see it as > > really worse than filesystem-level encryption either, to be clear. > > There's a bit of increased information leakage, as Peter mentioned and I > > agreed with, but it's not a lot and I expect in most cases that > > information leak would be acceptable. That seems like something which > > would need to be considered on a case-by-case basis. > > Isn't the leakage controlled by OS permissions, so is it really leakage, > i.e., if you can see the leakage, you probably have bypassed the OS > permissions and see the key and data anyway. The case I'm mainly considering is if you somehow lose control over the medium in which the encrypted database resides- that is, someone steals the hard drive, or perhaps the hard drive is sold without properly being wiped first, things like that. In such a case, there's no OS permissions to bypass because the OS is now controlled by the attacker. In that case, if the key wasn't stored on the hard drive, then the attacker would be able to see the contents of the filesystem and the associated metadata, but not the contents of the cluster. In that case, the distinction between filesystem-level encryption and PG-level encryption is that with filesystem-level encryption the attacker wouldn't be able to even see what files exist or any metadata about them, whereas with PG-level encryption that information would be available to the attacker. In terms of an online attack where an attacker has gained access to the system then, in general, you're right that if the attacker is able to see into the PG data directory at all then they've figured out a way to bypass the OS-level permissions and would then be able to see the data directly anyway. That's a different scenario which would most likely be helped by something like SELinux being used, which could prevent the attacker from being able to look at the PG data directory because the attacker has connected to the system from a network which isn't allowed to directly access those files. > > > The benefit is allowing configuration > > > in the database rather than the OS? > > > > No, the benefit is that the database administrator can configure it and > > set it up and not have to get an OS-level administrator involved. There > > may also be other reasons why filesystem-level encryption is difficult > > to set up or use in a certain environment, but this wouldn't depend on > > anything OS-related and therefore could be done. > > OK, my only point here is that we are going down a slippery slope of > implementing OS things in the database. There is nothing wrong with > that but it has often been something we have avoided, because of the > added complexity required in the db server. I'm not sure that I actually agree that encryption is really solely an OS-level function, or even that encryption at rest is solely the OS's job. As a counter-example, I encrypt my SSH keys and GPG keys routinely, even when I'm using OS-level filesystem encryption. Perhaps that's excessive of me, but, well, I don't find it so. :) > As a counter-example, we only added an external collation library to > Postgres when we clearly saw a benefit, e.g. detecting collation > changes. Right, and there's also the potential for adding more flexibility down the road, which I'm certainly all for, but I see value in having even this initial version of the feature too. > > Of course, either way you'd have to provide for a way to get the key > > from one system to the other. > > Uh, doesn't scp do this? I have trouble seeing how avoiding calling > openssl justifies changes to our database server. scp may not be an option as it requires network connectivity between the systems. This is also just one of the use-cases, and not the main reason, at least in my view, to add this feature. > > Also, tools like pg_basebackup could be used, with nearly zero changes, > > I think, to get an encrypted copy of the database for backup purposes, > > removing the need to work out a way to handle encrypting backups. > > I do think we need much more documentation on how to encrypt things, > though that is a separate issue. It might help to document how you > _should_ do things now to see the limitations we have. Improving our documentation would certainly be good, but I'm not sure that we can really recommend specific ways of doing things like filesystem-level encryption, as that's really OS-dependent and there could be trade-offs in different ways a given OS might provide that capability. I'm not sure that having our documentation generically say "you should use filesystem-level encryption" would really be very helpful. Perhaps I misunderstood your suggestion here though..? > > > Is the problem that you have to encrypt before sending and decrypt on > > > arrival, if you don't trust the transmission link? Is that used a lot? > > > Is having the db encrypt every write a reasonable solution to that? > > > > There's multiple use-cases here. Making it easier to copy the database > > is just one of them and it isn't the biggest one. The biggest benefit > > is that there's cases where filesystem-level encryption isn't really an > > option or, even if it is, it's not desirable for other reasons. > > I am thinking NAS storage you don't trust, though there is the leakage > there. Yes, NAS or SAN storage where you don't want the NAS/SAN administrators to have access to the data is a good example of where encryption would be useful. Of course, either filesystem-level or PG-level encryption would address that, but with the trade-off that PG-level encryption would allow the NAS/SAN administrators to see the file metadata, as discussed above. > > > As far as future features, we don't have to add the all features at this > > > time, but if someone has a good idea for an API and we can make it work > > > easily while adding this feature, why wouldn't we do that? > > > > Sure, I'm all for specific suggestions about how to improve the API, or > > just in general recommendations of how to improve the patch. The > > suggestions which have been made about key management don't really come > > across to me as specific API-level recommendations but rather "this > > would also be nice to have" kind of comments, which isn't really the > > same. > > They are related, or you can't even know that because you are/were > preventing the discussion from happening. As an example, full-cluster > encryption doesn't really add any new capabilities, maybe just easier > deployment for some users. Where things get useful is fine-grained > encryption, which file system-level encryption can't do. I agree that having fine-grained control would be helpful, but as I was trying to get at, that's going to involve new catalog tables and SQL syntax, and we'd still need to address how to encrypt the cluster-wide files anyway, which is what this patch is working to do. > Also, has anyone asked users if they would find db-encryption better > than file system encryption? I've been asked for this capability multiple times from our users and have generally pushed back and encouraged filesystem-level encryption. That hasn't always been an acceptable solution, unfortunately. Thanks! Stephen
Hi Ants, On Tue, Jun 13, 2017 at 09:07:49AM -0400, Peter Eisentraut wrote: > On 6/12/17 17:11, Ants Aasma wrote: > > I'm curious if the community thinks this is a feature worth having? > > Even considering that security experts would classify this kind of > > encryption as a checkbox feature. > > File system encryption already exists and is well-tested. I don't see > any big advantages in re-implementing all of this one level up. You > would have to touch every single place in PostgreSQL backend and tool > code where a file is being read or written. Yikes. I appreciate your work, but unfortunately I must agree with Peter. On Linux you can configure the full disc encryption using LUKS / dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using geli [2]. In my personal opinion PostgreSQL is already complicated enough. A few companies that hired system administrators that are too lazy to read two or three man pages is not a reason to re-implement file system encryption (or compression, or mirroring if that matters) in any open source RDBMS. [1] http://eax.me/dm-crypt/ [2] http://eax.me/freebsd-geli/ -- Best regards, Aleksander Alekseev
On Wed, Jun 14, 2017 at 12:04:26PM +0300, Aleksander Alekseev wrote: > Hi Ants, > > On Tue, Jun 13, 2017 at 09:07:49AM -0400, Peter Eisentraut wrote: > > On 6/12/17 17:11, Ants Aasma wrote: > > > I'm curious if the community thinks this is a feature worth having? > > > Even considering that security experts would classify this kind of > > > encryption as a checkbox feature. > > > > File system encryption already exists and is well-tested. I don't see > > any big advantages in re-implementing all of this one level up. You > > would have to touch every single place in PostgreSQL backend and tool > > code where a file is being read or written. Yikes. > > I appreciate your work, but unfortunately I must agree with Peter. > > On Linux you can configure the full disc encryption using LUKS / > dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using > geli [2]. In my personal opinion PostgreSQL is already complicated > enough. A few companies that hired system administrators that are too > lazy to read two or three man pages is not a reason to re-implement file > system encryption (or compression, or mirroring if that matters) in any > open source RDBMS. > Hi Aleksander, While I agree that configuring full disk encryption is not technically difficult, it requires much more privileged access to the system and basically requires the support of a system administrator. In addition, if a volume is not available for encryption, PostgreSQL support for encryption would still allow for its data to be encrypted and as others have mentioned can be enabled by the DBA alone. Regards, Ken
Hi Kenneth, > > > File system encryption already exists and is well-tested. I don't see > > > any big advantages in re-implementing all of this one level up. You > > > would have to touch every single place in PostgreSQL backend and tool > > > code where a file is being read or written. Yikes. > > > > I appreciate your work, but unfortunately I must agree with Peter. > > > > On Linux you can configure the full disc encryption using LUKS / > > dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using > > geli [2]. In my personal opinion PostgreSQL is already complicated > > enough. A few companies that hired system administrators that are too > > lazy to read two or three man pages is not a reason to re-implement file > > system encryption (or compression, or mirroring if that matters) in any > > open source RDBMS. > > While I agree that configuring full disk encryption is not technically > difficult, it requires much more privileged access to the system and > basically requires the support of a system administrator. In addition, > if a volume is not available for encryption, PostgreSQL support for > encryption would still allow for its data to be encrypted and as others > have mentioned can be enabled by the DBA alone. Frankly I'm having difficulties imagining when it could be a real problem. It doesn't seem to be such a burden to ask a colleague for assistance in case you don't have sufficient permissions to do something. And I got a strong feeling that solving bureaucracy issues of specific organizations by changing PostgreSQL core in very invasive way (keeping in mind testing, maintaining, etc) is misguided. -- Best regards, Aleksander Alekseev
> > While I agree that configuring full disk encryption is not technically > > difficult, it requires much more privileged access to the system and > > basically requires the support of a system administrator. In addition, > > if a volume is not available for encryption, PostgreSQL support for > > encryption would still allow for its data to be encrypted and as others > > have mentioned can be enabled by the DBA alone. > > Frankly I'm having difficulties imagining when it could be a real > problem. It doesn't seem to be such a burden to ask a colleague for > assistance in case you don't have sufficient permissions to do > something. And I got a strong feeling that solving bureaucracy issues of > specific organizations by changing PostgreSQL core in very invasive way > (keeping in mind testing, maintaining, etc) is misguided. In the same time implementing a plugable storage API and then implementing encrypted / compressed / whatever storage in a standalone extension using this API seems to be a reasonable thing to do. -- Best regards, Aleksander Alekseev
On Wed, Jun 14, 2017 at 04:13:57PM +0300, Aleksander Alekseev wrote: > > > While I agree that configuring full disk encryption is not technically > > > difficult, it requires much more privileged access to the system and > > > basically requires the support of a system administrator. In addition, > > > if a volume is not available for encryption, PostgreSQL support for > > > encryption would still allow for its data to be encrypted and as others > > > have mentioned can be enabled by the DBA alone. > > > > Frankly I'm having difficulties imagining when it could be a real > > problem. It doesn't seem to be such a burden to ask a colleague for > > assistance in case you don't have sufficient permissions to do > > something. And I got a strong feeling that solving bureaucracy issues of > > specific organizations by changing PostgreSQL core in very invasive way > > (keeping in mind testing, maintaining, etc) is misguided. > > In the same time implementing a plugable storage API and then implementing > encrypted / compressed / whatever storage in a standalone extension using > this API seems to be a reasonable thing to do. Agreed, good point. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Jun 13, 2017 at 06:29:20PM -0400, Stephen Frost wrote: > > Isn't the leakage controlled by OS permissions, so is it really leakage, > > i.e., if you can see the leakage, you probably have bypassed the OS > > permissions and see the key and data anyway. > > The case I'm mainly considering is if you somehow lose control over the > medium in which the encrypted database resides- that is, someone steals > the hard drive, or perhaps the hard drive is sold without properly being > wiped first, things like that. > > In such a case, there's no OS permissions to bypass because the OS is > now controlled by the attacker. In that case, if the key wasn't stored > on the hard drive, then the attacker would be able to see the contents > of the filesystem and the associated metadata, but not the contents of > the cluster. > > In that case, the distinction between filesystem-level encryption and > PG-level encryption is that with filesystem-level encryption the > attacker wouldn't be able to even see what files exist or any metadata > about them, whereas with PG-level encryption that information would be > available to the attacker. Yes, Peter Eisentraut pointed that out in an earlier email in this thread. Thanks. > > > > The benefit is allowing configuration > > > > in the database rather than the OS? > > > > > > No, the benefit is that the database administrator can configure it and > > > set it up and not have to get an OS-level administrator involved. There > > > may also be other reasons why filesystem-level encryption is difficult > > > to set up or use in a certain environment, but this wouldn't depend on > > > anything OS-related and therefore could be done. > > > > OK, my only point here is that we are going down a slippery slope of > > implementing OS things in the database. There is nothing wrong with > > that but it has often been something we have avoided, because of the > > added complexity required in the db server. > > I'm not sure that I actually agree that encryption is really solely an > OS-level function, or even that encryption at rest is solely the OS's > job. As a counter-example, I encrypt my SSH keys and GPG keys > routinely, even when I'm using OS-level filesystem encryption. Perhaps > that's excessive of me, but, well, I don't find it so. :) Well, I think SSH and GPG keys are a case of selective encryption, which is where I said database encryption could really be a win, because you can't do that outside the database. Just to go crazy, here is something I think would be cool for a fully or partially encrypted data row: row data encrypted with symmetric key kk encrypted with public key of user 1k encrypted with public key of user 2hash ofprevious fields signed by insert user The would allow the insert user complete control over who sees the row. The database administrator could corrupt the row or add/remove users, but that would be detected by the hash signature being invalid. This is kind of like TLS bundled in the database. I think this is actually all possible now too. :-) > > As a counter-example, we only added an external collation library to > > Postgres when we clearly saw a benefit, e.g. detecting collation > > changes. > > Right, and there's also the potential for adding more flexibility down > the road, which I'm certainly all for, but I see value in having even > this initial version of the feature too. Understood. > > > Also, tools like pg_basebackup could be used, with nearly zero changes, > > > I think, to get an encrypted copy of the database for backup purposes, > > > removing the need to work out a way to handle encrypting backups. > > > > I do think we need much more documentation on how to encrypt things, > > though that is a separate issue. It might help to document how you > > _should_ do things now to see the limitations we have. > > Improving our documentation would certainly be good, but I'm not sure > that we can really recommend specific ways of doing things like > filesystem-level encryption, as that's really OS-dependent and there > could be trade-offs in different ways a given OS might provide that > capability. I'm not sure that having our documentation generically say > "you should use filesystem-level encryption" would really be very > helpful. > > Perhaps I misunderstood your suggestion here though..? I was just throwing out the idea that sometimes writing things down shows the gaps in our feature-set --- it might not apply here. > > > > Is the problem that you have to encrypt before sending and decrypt on > > > > arrival, if you don't trust the transmission link? Is that used a lot? > > > > Is having the db encrypt every write a reasonable solution to that? > > > > > > There's multiple use-cases here. Making it easier to copy the database > > > is just one of them and it isn't the biggest one. The biggest benefit > > > is that there's cases where filesystem-level encryption isn't really an > > > option or, even if it is, it's not desirable for other reasons. > > > > I am thinking NAS storage you don't trust, though there is the leakage > > there. > > Yes, NAS or SAN storage where you don't want the NAS/SAN administrators > to have access to the data is a good example of where encryption would > be useful. Of course, either filesystem-level or PG-level encryption > would address that, but with the trade-off that PG-level encryption > would allow the NAS/SAN administrators to see the file metadata, as > discussed above. Uh, as far as I understand it, SAN could technically use encryption because you are sending blocks to the network storage and you could encrypt the blocks before transfer. However, I don't think that would work for NAS/NFS. > > Also, has anyone asked users if they would find db-encryption better > > than file system encryption? > > I've been asked for this capability multiple times from our users and > have generally pushed back and encouraged filesystem-level encryption. > That hasn't always been an acceptable solution, unfortunately. Yes, it would be good to know how often that happens, and whether we should be adjusting Postgres to address it. The idea posted of using plugin storage for encryption seems like a cool idea, and compression and stuff could be added. (However, I realize encryption and compression doesn't work well because of information leakage.) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Of course, what would be even more useful is fine-grained encryption - > encrypt these tables (and the corresponding indexes, toast tables, and > WAL records related to any of that) with this key, encrypt these other > tables (and the same list of associated stuff) with this other key, > and leave the rest unencrypted. The problem with that is that you > probably can't run recovery without all of the keys, and even on a > clean startup there would be a good deal of engineering work involved > in refusing access to tables whose key hadn't been provided yet. That's pretty much the reason why I decided to skip anything more complicated for now. Anything that would be able to run recovery without knowing the encryption key looks like an order of magnitude more complicated to implement. > Performance is likely to be poor on large databases, > because every time a page transits between shared_buffers and the > buffer cache we've got to en/decrypt, but as long as it's only poor > for the people who opt into the feature I don't see a big problem with > that. It would make sense to tune the database with large shared buffers if encryption is enabled. That should make sure that most shared buffers traffic is going to disk anyway. As for performance, I have a prototype assembly implementation of AES that does 3GB/s/core on my laptop. If we add that behind a CPUID check the overhead should be quite reasonable. > I anticipate that one of the trickier problems here will be handling > encryption of the write-ahead log. Suppose you encrypt WAL a block at > a time. In the current system, once you've written and flushed a > block, you can consider it durably committed, but if that block is > encrypted, this is no longer true. A crash might tear the block, > making it impossible to decrypt. Replay will therefore stop at the > end of the previous block, not at the last record actually flushed as > would happen today. My patch is currenly doing a block at a time for WAL. The XTS mode used to encrypt has the useful property that blocks that share identical prefix unencrypted also have identical prefix when encrypted. It requires that the tearing is 16B aligned, but I think that is true for pretty much all storage systems. That property of course has security downsides, but for table/index storage we have a nonce in the form of LSN in the page header eliminating the issue. > So, your synchronous_commit suddenly isn't. A > similar problem will occur any other page where we choose not to > protect against torn pages using full page writes. For instance, > unless checksums are enabled or wal_log_hints=on, we'll write a data > page where a single bit has been flipped and assume that the bit will > either make it to disk or not; the page can't really be torn in any > way that hurts us. But with encryption that's no longer true, because > the hint bit will turn into much more than a single bit flip, and > rereading that page with half old and half new contents will be the > end of the world (TM). I don't know off-hand whether we're > protecting, say, CLOG page writes with FPWs.: because setting a couple > of bits is idempotent and doesn't depend on the existing page > contents, we might not need it currently, but with encryption, every > bit in the page depends on every other bit in the page, so we > certainly would. I don't know how many places we've got assumptions > like this baked into the system, but I'm guessing there are a bunch. I think we need to require wal_log_hints=on when encryption is enabled. Currently I have not considered tearing on CLOG bits. Other SLRUs probably have similar issues. I need to think a bit about how to solve that. Regards, Ants Aasma
On Wed, Jun 14, 2017 at 06:10:32PM +0300, Ants Aasma wrote: > On Tue, Jun 13, 2017 at 6:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > Performance is likely to be poor on large databases, > > because every time a page transits between shared_buffers and the > > buffer cache we've got to en/decrypt, but as long as it's only poor > > for the people who opt into the feature I don't see a big problem with > > that. > > It would make sense to tune the database with large shared buffers if > encryption is enabled. That should make sure that most shared buffers > traffic is going to disk anyway. As for performance, I have a > prototype assembly implementation of AES that does 3GB/s/core on my > laptop. If we add that behind a CPUID check the overhead should be > quite reasonable. Are you checking the CPU type or if AES instructions are enabled on the CPU? I ask this because I just realized in researching my new TLS talk that my BIOS defaults to AES instructions disabled, and I had to manually enable it. > > I anticipate that one of the trickier problems here will be handling > > encryption of the write-ahead log. Suppose you encrypt WAL a block at > > a time. In the current system, once you've written and flushed a > > block, you can consider it durably committed, but if that block is > > encrypted, this is no longer true. A crash might tear the block, > > making it impossible to decrypt. Replay will therefore stop at the > > end of the previous block, not at the last record actually flushed as > > would happen today. > > My patch is currenly doing a block at a time for WAL. The XTS mode Uh, how are you writing partial writes to the WAL. I assume you are doing a streaming cipher so you can write in increments, right? > used to encrypt has the useful property that blocks that share > identical prefix unencrypted also have identical prefix when > encrypted. It requires that the tearing is 16B aligned, but I think > that is true for pretty much all storage systems. That property of > course has security downsides, but for table/index storage we have a > nonce in the form of LSN in the page header eliminating the issue. > > > So, your synchronous_commit suddenly isn't. A > > similar problem will occur any other page where we choose not to > > protect against torn pages using full page writes. For instance, > > unless checksums are enabled or wal_log_hints=on, we'll write a data > > page where a single bit has been flipped and assume that the bit will > > either make it to disk or not; the page can't really be torn in any > > way that hurts us. But with encryption that's no longer true, because > > the hint bit will turn into much more than a single bit flip, and > > rereading that page with half old and half new contents will be the > > end of the world (TM). I don't know off-hand whether we're > > protecting, say, CLOG page writes with FPWs.: because setting a couple > > of bits is idempotent and doesn't depend on the existing page > > contents, we might not need it currently, but with encryption, every > > bit in the page depends on every other bit in the page, so we > > certainly would. I don't know how many places we've got assumptions > > like this baked into the system, but I'm guessing there are a bunch. > > I think we need to require wal_log_hints=on when encryption is > enabled. Currently I have not considered tearing on CLOG bits. Other > SLRUs probably have similar issues. I need to think a bit about how to > solve that. I am not sure if clog even needs to be encrypted. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote: > Are you checking the CPU type or if AES instructions are enabled on the > CPU? I ask this because I just realized in researching my new TLS talk > that my BIOS defaults to AES instructions disabled, and I had to > manually enable it. There is zero code for that now, but the plan was to check the CPUID instruction. My understanding is that it should report what is currently enabled on the CPU. Will double check when actually writing the code for the check. >> > I anticipate that one of the trickier problems here will be handling >> > encryption of the write-ahead log. Suppose you encrypt WAL a block at >> > a time. In the current system, once you've written and flushed a >> > block, you can consider it durably committed, but if that block is >> > encrypted, this is no longer true. A crash might tear the block, >> > making it impossible to decrypt. Replay will therefore stop at the >> > end of the previous block, not at the last record actually flushed as >> > would happen today. >> >> My patch is currenly doing a block at a time for WAL. The XTS mode > > Uh, how are you writing partial writes to the WAL. I assume you are > doing a streaming cipher so you can write in increments, right? We were doing 8kB page aligned writes to WAL anyway. I just encrypt the block before it gets written out. >> I think we need to require wal_log_hints=on when encryption is >> enabled. Currently I have not considered tearing on CLOG bits. Other >> SLRUs probably have similar issues. I need to think a bit about how to >> solve that. > > I am not sure if clog even needs to be encrypted. Me neither, but it currently is, and it looks like that's broken in a "silently corrupts your data" way in face of torn writes. Using OFB mode (xor plaintext with pseudorandom stream for cipher) looks like it might help here, if other approaches fail. Regards, Ants Aasma
On Wed, Jun 14, 2017 at 06:41:43PM +0300, Ants Aasma wrote: > On Wed, Jun 14, 2017 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Are you checking the CPU type or if AES instructions are enabled on the > > CPU? I ask this because I just realized in researching my new TLS talk > > that my BIOS defaults to AES instructions disabled, and I had to > > manually enable it. > > There is zero code for that now, but the plan was to check the CPUID > instruction. My understanding is that it should report what is > currently enabled on the CPU. Will double check when actually writing > the code for the check. Just for specifics, I have two Intel Xeon CPU E5620, but the AES instructions were disabled for this CPU since 2012 when I bought it. :-( The good news is that only recently have I forced https in some pages so this is the first time I heavily need it. I now have a boot test, which returns 16: grep -c '\<aes\>' /proc/cpuinfo > >> > I anticipate that one of the trickier problems here will be handling > >> > encryption of the write-ahead log. Suppose you encrypt WAL a block at > >> > a time. In the current system, once you've written and flushed a > >> > block, you can consider it durably committed, but if that block is > >> > encrypted, this is no longer true. A crash might tear the block, > >> > making it impossible to decrypt. Replay will therefore stop at the > >> > end of the previous block, not at the last record actually flushed as > >> > would happen today. > >> > >> My patch is currently doing a block at a time for WAL. The XTS mode > > > > Uh, how are you writing partial writes to the WAL. I assume you are > > doing a streaming cipher so you can write in increments, right? > > We were doing 8kB page aligned writes to WAL anyway. I just encrypt > the block before it gets written out. Oh, we do. The beauty of streaming ciphers built on block ciphers is that you can pre-compute the cipher to be XOR'ed with the data because the block cipher output doesn't depend on the user data. This is used for SSH, for example. > >> I think we need to require wal_log_hints=on when encryption is > >> enabled. Currently I have not considered tearing on CLOG bits. Other > >> SLRUs probably have similar issues. I need to think a bit about how to > >> solve that. > > > > I am not sure if clog even needs to be encrypted. > > Me neither, but it currently is, and it looks like that's broken in a > "silently corrupts your data" way in face of torn writes. Using OFB > mode (xor plaintext with pseudorandom stream for cipher) looks like it > might help here, if other approaches fail. I would just document the limitation and move on. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 6/14/17 05:04, Aleksander Alekseev wrote: > A few companies that hired system administrators that are too > lazy to read two or three man pages is not a reason to re-implement file > system encryption (or compression, or mirroring if that matters) in any > open source RDBMS. To be fair, we did implement our own compression (TOAST) and mirroring (compared to, say, DRBD) because there were clear advantages in simplicity and performance. Checksumming is another area where we moved forward in spite of arguments that the file system should do it. So we will need to see the whole picture. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/13/17 18:11, Stephen Frost wrote: >> Let's see a proposal in those terms then. How easy can you make it, >> compared to existing OS-level solutions, and will that justify the >> maintenance overhead? > From the original post on this thread, which included a WIP patch: > > ---------------------------------- > Usage > ===== > > Set up database like so: > > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; > export PGENCRYPTIONKEY > initdb -k -K pgcrypto $PGDATA ) > > Start PostgreSQL: > > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; > export PGENCRYPTIONKEY > postgres $PGDATA ) > ---------------------------------- Relying on environment variables is clearly pretty crappy. So if that's the proposal, then I think it needs to be better. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter, * Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 6/13/17 18:11, Stephen Frost wrote: > >> Let's see a proposal in those terms then. How easy can you make it, > >> compared to existing OS-level solutions, and will that justify the > >> maintenance overhead? > > From the original post on this thread, which included a WIP patch: > > > > ---------------------------------- > > Usage > > ===== > > > > Set up database like so: > > > > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; > > export PGENCRYPTIONKEY > > initdb -k -K pgcrypto $PGDATA ) > > > > Start PostgreSQL: > > > > (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; > > export PGENCRYPTIONKEY > > postgres $PGDATA ) > > ---------------------------------- > > Relying on environment variables is clearly pretty crappy. So if that's > the proposal, then I think it needs to be better. I don't believe that was ever intended to be the final solution, I was just pointing out that it's what the WIP patch did. The discussion had moved into having a command called which provided the key on stdout, as I recall, allowing it to be whatever the user wished, including binary of any kind. If you have other suggestions, I'm sure they would be well received. As to the question of complexity, it certainly looks like it'll probably be quite straight-forward for users to use. Thanks! Stephen
On Wed, Jun 14, 2017 at 5:41 PM, Stephen Frost <sfrost@snowman.net> wrote: > I don't believe that was ever intended to be the final solution, I was > just pointing out that it's what the WIP patch did. > > The discussion had moved into having a command called which provided the > key on stdout, as I recall, allowing it to be whatever the user wished, > including binary of any kind. > > If you have other suggestions, I'm sure they would be well received. As > to the question of complexity, it certainly looks like it'll probably be > quite straight-forward for users to use. To me, this reads a bit like you're still trying to shut down the discussion here. Perhaps I am misreading it. Upthread, you basically said that we shouldn't talk about key management (specifically, you said, "Key management is an entirely independent discussion from this") which I think is a ridiculous statement. We have to have some kind of simple key management in order to have the feature at all. It does not have to be crazy complicated, but it has to exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/14/17 17:41, Stephen Frost wrote: >> Relying on environment variables is clearly pretty crappy. So if that's >> the proposal, then I think it needs to be better. > I don't believe that was ever intended to be the final solution, I was > just pointing out that it's what the WIP patch did. > > The discussion had moved into having a command called which provided the > key on stdout, as I recall, allowing it to be whatever the user wished, > including binary of any kind. > > If you have other suggestions, I'm sure they would be well received. As > to the question of complexity, it certainly looks like it'll probably be > quite straight-forward for users to use. I think the passphrase entry part of the problem is actually a bit harder than it appears. Making this work well would be a major part of the usability story that this is being sold on. If the proposed solution is that you can cobble together a few bits of shell, then not only is that not very user-friendly, it also won't work consistently across platforms, won't work under systemd (launchd? Windows service?), and might behave awkwardly under restricted environments where there is no terminal or only a limited OS environment. Moreover, it leaves the security aspects of that part of the solution (keys lingering in memory or in swap) up to the user. There was a discussion a while ago about how to handle passphrase entry for SSL keys. The conclusion was that it works pretty crappily right now, and several suggestions for improvement were discussed. I suggest that fixing that properly and with flexibility could also yield a solution for encryption key entry. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 15, 2017 at 12:06 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Making this work well would be a major part of the usability story that > this is being sold on. If the proposed solution is that you can cobble > together a few bits of shell, then not only is that not very > user-friendly, it also won't work consistently across platforms, won't > work under systemd (launchd? Windows service?), and might behave > awkwardly under restricted environments where there is no terminal or > only a limited OS environment. Moreover, it leaves the security aspects > of that part of the solution (keys lingering in memory or in swap) up to > the user. > > There was a discussion a while ago about how to handle passphrase entry > for SSL keys. The conclusion was that it works pretty crappily right > now, and several suggestions for improvement were discussed. I suggest > that fixing that properly and with flexibility could also yield a > solution for encryption key entry. That sounds like a good idea to me. However, I'd like to disagree with the idea that key management is the primary way in which this feature would improve usability. To me, the big advantage is that you don't need to be root (also, we can have more consistency in behavior across operating systems). I disagree vigorously with the idea that anyone who wants to encrypt their PostgreSQL database should just get root privileges on the system and use an encrypted filesystem. In some environments, "just get root privileges" is not something that is very easy to do; but even if you *have* root privileges, you don't necessarily want to have to use them just to install and configure your database. Right now, I can compile and install PostgreSQL from my own user account, run initdb, and start it up. Surely everyone PostgreSQL developer in the world - and a good number of users - would agree that if I suddenly needed to run initdb as root, that would be a huge usability regression. You wouldn't even be able to run 'make check' without root privileges, which would suck. In the same way, when we removed (for most users) the need to tune System V shared memory parameters (b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67), a lot of users were very happy precisely because it eliminated a setup step that formerly had to be done as root. It is very reasonable to suppose that users who need encryption will similarly be happy if they no longer need to be root to get an encrypted PostgreSQL working. Sure, that's only a subset of users rather than all of them, but it's the same kind of issue. Also, I don't think we should be presenting filesystem encryption and built-in encryption as if they were two methods of solving the exact same problem. In some scenarios, they are in fact solving the same problem. However, it's entirely reasonable to want to use both. For example, the hard disk in my laptop is encrypted, because that's a thing Apple does. If somebody steals the hard disk out of my laptop, they may have some difficulty recovering the contents. However, as Stephen also mentioned, that has not deterred me from putting a passphrase on my SSH keys. There are situations in which the passphrase provides protection that the whole-drive encryption won't. For example, if I copy my home directory onto a USB stick and then copy it from there to a new laptop, somebody might steal the USB stick. If they manage to do that, they will get most of my files, but they won't get my SSH keys, or at least not without guessing my passphrase. Similarly, if I foolishly walk away from my laptop in the presence of some nefarious person (say, Magnus) without locking it, that person can't steal my keys. That person might be able to impersonate me for as long as I'm away from the laptop, if the keys are loaded into my SSH agent, but not afterwards. The issues for the database are similar. You might want one of these things or the other or both depending on the exact situation. To be honest, I find the hostility toward this feature a bit baffling. The argument seems to be essentially that we shouldn't have this feature because we'd have to maintain the code and many of the same goals could be accomplished by using facilities that already exist outside the database server. But that's also true for parallel query (cf. Stado), logical replication (cf. Slony, Bucardo, Londiste), physical replication (cf. DRBD), partitioning (cf. pg_partman), RLS (cf. veil), and anything that could be written as application logic (eg. psql's \if ... \endif, every procedural language we have, user-defined functions themselves, database-enforced constraints, FDWs). Yet, in every one of those cases, we find it worthwhile to have the feature because it works better and is easier to use when it's built in. I don't think that a patch for this feature is likely to be bigger than (or even as large as) the patches for logical replication or parallel query, and it will probably be less work to maintain going forward than either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 15, 2017 at 03:09:32PM -0400, Robert Haas wrote: > To be honest, I find the hostility toward this feature a bit baffling. > The argument seems to be essentially that we shouldn't have this > feature because we'd have to maintain the code and many of the same > goals could be accomplished by using facilities that already exist > outside the database server. But that's also true for parallel query > (cf. Stado), logical replication (cf. Slony, Bucardo, Londiste), > physical replication (cf. DRBD), partitioning (cf. pg_partman), RLS > (cf. veil), and anything that could be written as application logic > (eg. psql's \if ... \endif, every procedural language we have, > user-defined functions themselves, database-enforced constraints, > FDWs). Yet, in every one of those cases, we find it worthwhile to > have the feature because it works better and is easier to use when > it's built in. I don't think that a patch for this feature is likely > to be bigger than (or even as large as) the patches for logical > replication or parallel query, and it will probably be less work to > maintain going forward than either. I think the big win for having OS features in the database is selectivity --- the ability to selectively apply a feature to part of the database. This is what you are doing by putting a password on your SSH key, and my idea about row encryption. It is also a question of convenience. If SSH told users they have to create an encrypted volume to store their SSH keys with a password, it would be silly, since the files are so small compared to a file system. I think the assumption is that any security-concerned deployment of Postgres will already have Postgres on its own partition and have the root administrator involved. I think it is this assumption that drives the idea that requiring root to run Postgres doesn't make sense, but it does to do encryption. Also, there is the sense that security requires trust of the root user, while using Postgres doesn't require the root user to also use Postgres. One serious difference between in-database-encryption and SSH keys is that the use of passwords for SSH is well understood and reasonable to use, while I think we all admit that use of passwords for database objects like SSL keys is murky. Use of keys for OS-level encryption is a little better handled, but not as clean as SSH keys. I admit there is no hard line here, so I guess we will have to see what the final patch looks like. I am basing my statements on what I guess the complexity will be. Complexity has a cost so we will have to weigh it when we see it. When SSH added password access, it was probably an easy decision because the use-case was high and the complexity was low. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Jun 15, 2017 at 4:29 PM, Bruce Momjian <bruce@momjian.us> wrote: > I think the big win for having OS features in the database is > selectivity --- the ability to selectively apply a feature to part of > the database. This is what you are doing by putting a password on your > SSH key, and my idea about row encryption. I agree. I think we will eventually want to be able to apply encryption selectively, although I don't think we need to have that feature in a first patch. One problem is that if you don't encrypt the WAL, there's not much point in encrypting the table data, so it becomes tricky to encrypt some things and not others. However, I am sure we can eventually solve those problems, given enough time and development effort. > It is also a question of convenience. If SSH told users they have to > create an encrypted volume to store their SSH keys with a password, it > would be silly, since the files are so small compared to a file system. > I think the assumption is that any security-concerned deployment of > Postgres will already have Postgres on its own partition and have the > root administrator involved. I think it is this assumption that drives > the idea that requiring root to run Postgres doesn't make sense, but it > does to do encryption. I don't think that's a particularly good assumption, though, especially with the proliferation of virtual and containerized environments where access to root privileges tends to be more circumscribed. Also, there are a lot of small databases out there that you might want to be able to encrypt without encrypting everything on the filesystem. For example, there are products that embed PostgreSQL. If a particular PostgreSQL configuration requires root access, then using that configuration means that the installing the product which contains it also requires root access. Installing the product means changing /etc/fstab, and uninstalling it means reversing those changes. That's very awkward. I agree that if you've got a terabyte of sensitive data, you probably want to encrypt the filesystem and involve the DBA, but there are still people who have a gigabyte of sensitive data. For those people, a separate filesystem likely doesn't make sense, but they may still want encryption. > Also, there is the sense that security requires > trust of the root user, while using Postgres doesn't require the root > user to also use Postgres. I don't understand this. It is certainly true that you're running binaries owned by root, the root user could Trojan the binaries and break any security you think you have. But that problem is no better or worse for PostgreSQL than anything else. > One serious difference between in-database-encryption and SSH keys is > that the use of passwords for SSH is well understood and reasonable to > use, while I think we all admit that use of passwords for database > objects like SSL keys is murky. Use of keys for OS-level encryption is > a little better handled, but not as clean as SSH keys. Peter pointed out upthread that our handling of SSL passphrases leaves a lot to be desired, and that maybe we should fix that problem first; I agree. But I don't think this is any kind of intrinsic limitation of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a quality-of-implementation issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 15, 2017 at 05:04:17PM -0400, Robert Haas wrote: > > Also, there is the sense that security requires > > trust of the root user, while using Postgres doesn't require the root > > user to also use Postgres. > > I don't understand this. It is certainly true that you're running > binaries owned by root, the root user could Trojan the binaries and > break any security you think you have. But that problem is no better > or worse for PostgreSQL than anything else. I couldn't find a cleaner way to see it --- it is that database use doesn't involve the root user using it, while database security requires the root user to also be security-conscious. > > One serious difference between in-database-encryption and SSH keys is > > that the use of passwords for SSH is well understood and reasonable to > > use, while I think we all admit that use of passwords for database > > objects like SSL keys is murky. Use of keys for OS-level encryption is > > a little better handled, but not as clean as SSH keys. > > Peter pointed out upthread that our handling of SSL passphrases leaves > a lot to be desired, and that maybe we should fix that problem first; > I agree. But I don't think this is any kind of intrinsic limitation > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a > quality-of-implementation issue. I think there are environmental issues that make password use on SSH easier than the other cases --- it isn't just code quality. However, it would be good to research how SSH handles it to see if we can get any ideas. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, Jun 15, 2017 at 05:04:17PM -0400, Robert Haas wrote: > > > Also, there is the sense that security requires > > > trust of the root user, while using Postgres doesn't require the root > > > user to also use Postgres. > > > > I don't understand this. It is certainly true that you're running > > binaries owned by root, the root user could Trojan the binaries and > > break any security you think you have. But that problem is no better > > or worse for PostgreSQL than anything else. > > I couldn't find a cleaner way to see it --- it is that database use > doesn't involve the root user using it, while database security requires > the root user to also be security-conscious. I tend to agree with Robert here (in general). Further, there are certainly environments where the administrator with root access is absolutely security conscious, but that doesn't mean that the DBA has easy access to the root account or to folks who have that level of access and it's much easier for the DBA if they're able to address all of their requirements by building PG themselves, installing it, and, ideally, encrypting the DB. > > > One serious difference between in-database-encryption and SSH keys is > > > that the use of passwords for SSH is well understood and reasonable to > > > use, while I think we all admit that use of passwords for database > > > objects like SSL keys is murky. Use of keys for OS-level encryption is > > > a little better handled, but not as clean as SSH keys. > > > > Peter pointed out upthread that our handling of SSL passphrases leaves > > a lot to be desired, and that maybe we should fix that problem first; > > I agree. But I don't think this is any kind of intrinsic limitation > > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a > > quality-of-implementation issue. I'm not thrilled with asking Ants to implement a solution to SSL passphrases, and generalizing it to work for this, to get this feature accepted. I assume that the reason for asking for that work to be done now is because we decided that the current approach for SSL sucks but we couldn't actually drop support for it, but we don't want to add other features which work in a similar way because, well, it sucks. I get that. I'm not thrilled with it, but I get it. I'm hopeful it ends up not being too bad, but if it ends up meaning we don't get this feature, then I'll reconsider my position about agreeing that it's an acceptable requirement. > I think there are environmental issues that make password use on SSH > easier than the other cases --- it isn't just code quality. However, it > would be good to research how SSH handles it to see if we can get any > ideas. Actually, the approach SSH uses is a really good one, imv, and one which we might be able to leverage.. I'm not sure though. I will say that, in general, I like the idea of leveraging the external libraries which handle keys and deal with encryption to make this happen as those allow things like hardware devices to hold the key and possibly perform the encryption/decryption, etc. Thanks! Stephen
On Thu, Jun 15, 2017 at 06:41:08PM -0400, Stephen Frost wrote: > > > > One serious difference between in-database-encryption and SSH keys is > > > > that the use of passwords for SSH is well understood and reasonable to > > > > use, while I think we all admit that use of passwords for database > > > > objects like SSL keys is murky. Use of keys for OS-level encryption is > > > > a little better handled, but not as clean as SSH keys. > > > > > > Peter pointed out upthread that our handling of SSL passphrases leaves > > > a lot to be desired, and that maybe we should fix that problem first; > > > I agree. But I don't think this is any kind of intrinsic limitation > > > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a > > > quality-of-implementation issue. > > I'm not thrilled with asking Ants to implement a solution to SSL > passphrases, and generalizing it to work for this, to get this feature > accepted. I assume that the reason for asking for that work to be done > now is because we decided that the current approach for SSL sucks but we > couldn't actually drop support for it, but we don't want to add other > features which work in a similar way because, well, it sucks. My point is that if our support for db-level encryption is as bad as SSL key passwords, then it will be nearly useless, so we might as well not have it. Isn't that obvious? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, Jun 15, 2017 at 06:41:08PM -0400, Stephen Frost wrote: > > > > > One serious difference between in-database-encryption and SSH keys is > > > > > that the use of passwords for SSH is well understood and reasonable to > > > > > use, while I think we all admit that use of passwords for database > > > > > objects like SSL keys is murky. Use of keys for OS-level encryption is > > > > > a little better handled, but not as clean as SSH keys. > > > > > > > > Peter pointed out upthread that our handling of SSL passphrases leaves > > > > a lot to be desired, and that maybe we should fix that problem first; > > > > I agree. But I don't think this is any kind of intrinsic limitation > > > > of PostgreSQL vs. encrypted filesystems vs. SSH; it's just a > > > > quality-of-implementation issue. > > > > I'm not thrilled with asking Ants to implement a solution to SSL > > passphrases, and generalizing it to work for this, to get this feature > > accepted. I assume that the reason for asking for that work to be done > > now is because we decided that the current approach for SSL sucks but we > > couldn't actually drop support for it, but we don't want to add other > > features which work in a similar way because, well, it sucks. > > My point is that if our support for db-level encryption is as bad as SSL > key passwords, then it will be nearly useless, so we might as well not > have it. Isn't that obvious? Well, no, because the reason we even have an approach at all for SSL key passwords is because multiple people (myself and Magnus, at least, as I recall) complained as we are aware of installations which are actively using that approach. That doesn't mean it's a great solution or that it doesn't suck- really, I tend to agree that it does, but it's necessary because we need a solution, it works, and users are using it. Having a better solution would be great and something agent-based might be the right answer (or perhaps something where we have support for using hardware accellerators through an existing library...). I expect the same would happen with the shell-command approach suggested up-thread and the prompt-on-stdin approach too, they aren't great but I expect users would still use the feature. As Robert and I have mentioned, there is a good bit of value to having this feature simply because it avoids the need to get someone with root privileges to set up an encrypted volume and I don't think having to use a shell command or providing the password on stdin at startup really changes that very much. Thanks! Stephen
On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote: > I expect the same would happen with the shell-command approach suggested > up-thread and the prompt-on-stdin approach too, they aren't great but I > expect users would still use the feature. As Robert and I have > mentioned, there is a good bit of value to having this feature simply > because it avoids the need to get someone with root privileges to set up > an encrypted volume and I don't think having to use a shell command or > providing the password on stdin at startup really changes that very > much. Understood, but now you are promoting a feature with an admittedly-poor API, duplication of an OS feature, and perhaps an invasive change to the code. Those are high hurdles. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote: > > I expect the same would happen with the shell-command approach suggested > > up-thread and the prompt-on-stdin approach too, they aren't great but I > > expect users would still use the feature. As Robert and I have > > mentioned, there is a good bit of value to having this feature simply > > because it avoids the need to get someone with root privileges to set up > > an encrypted volume and I don't think having to use a shell command or > > providing the password on stdin at startup really changes that very > > much. > > Understood, but now you are promoting a feature with an admittedly-poor > API, duplication of an OS feature, and perhaps an invasive change to the > code. Those are high hurdles. Of those, the only one that worries me, at least, is that it might be an invasive and difficult to maintain code change. As Robert said, and I agree with, "duplication of an OS feature" is something we pretty routinly, and justifiably, do. The poor interface is unfortunate, but if it's consistent with what we have today for a similar feature then I'm really not too upset with it. If we can do better, great, I'm all for that, but if not, then I'd rather have the feature with the poor interface than not have it at all. If it's an invasive code change or one which ends up being difficult to maintain, then that's a problem. Getting some focus on that aspect would be great and I certainly appreciate Robert's initial review and commentary on it. Thanks! Stephen
Bruce Momjian wrote: > On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote: > > I expect the same would happen with the shell-command approach suggested > > up-thread and the prompt-on-stdin approach too, they aren't great but I > > expect users would still use the feature. As Robert and I have > > mentioned, there is a good bit of value to having this feature simply > > because it avoids the need to get someone with root privileges to set up > > an encrypted volume and I don't think having to use a shell command or > > providing the password on stdin at startup really changes that very > > much. > > Understood, but now you are promoting a feature with an admittedly-poor > API, duplication of an OS feature, and perhaps an invasive change to the > code. Those are high hurdles. I thought we called it "incremental development". From the opposite point of view, would you say we should ban use of passphrase-protected SSL key files because the current user interface for them is bad? I have no use for data-at-rest encryption myself, but I wouldn't stop development just because the initial design proposal doesn't include top-notch key management. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 15, 2017 at 07:51:36PM -0400, Alvaro Herrera wrote: > Bruce Momjian wrote: > > On Thu, Jun 15, 2017 at 07:27:55PM -0400, Stephen Frost wrote: > > > I expect the same would happen with the shell-command approach suggested > > > up-thread and the prompt-on-stdin approach too, they aren't great but I > > > expect users would still use the feature. As Robert and I have > > > mentioned, there is a good bit of value to having this feature simply > > > because it avoids the need to get someone with root privileges to set up > > > an encrypted volume and I don't think having to use a shell command or > > > providing the password on stdin at startup really changes that very > > > much. > > > > Understood, but now you are promoting a feature with an admittedly-poor > > API, duplication of an OS feature, and perhaps an invasive change to the > > code. Those are high hurdles. > > I thought we called it "incremental development". From the opposite > point of view, would you say we should ban use of passphrase-protected > SSL key files because the current user interface for them is bad? > > I have no use for data-at-rest encryption myself, but I wouldn't stop > development just because the initial design proposal doesn't include > top-notch key management. Yes, but we have to have a plan on how to improve it. Why add a feature that is hard to maintain, and hard to use. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 2017-06-15 19:44:43 -0400, Bruce Momjian wrote: > Understood, but now you are promoting a feature with an admittedly-poor > API, duplication of an OS feature, and perhaps an invasive change to the > code. *Perhaps* an invasive change to the code? To me it's pretty evident that this'll be a pretty costly feature from that angle. We've quite a few places that manipulate on-disk files, and they'll all have to be manipulated. Several of those are essentially critical sections, adding memory allocations to them wouldn't be good, so we'll need pre-allocation APIs. I've only skimmed the discussion, but based on that I'm very surprised how few concerns about this feature's complexity / maintainability impact have been raised. - Andres
On Thu, Jun 15, 2017 at 04:56:36PM -0700, Andres Freund wrote: > On 2017-06-15 19:44:43 -0400, Bruce Momjian wrote: > > Understood, but now you are promoting a feature with an admittedly-poor > > API, duplication of an OS feature, and perhaps an invasive change to the > > code. > > *Perhaps* an invasive change to the code? To me it's pretty evident > that this'll be a pretty costly feature from that angle. We've quite a > few places that manipulate on-disk files, and they'll all have to be > manipulated. Several of those are essentially critical sections, adding > memory allocations to them wouldn't be good, so we'll need > pre-allocation APIs. > > I've only skimmed the discussion, but based on that I'm very surprised > how few concerns about this feature's complexity / maintainability > impact have been raised. Yeah, I guess we will just have to wait to see it since other people are excited about it. My concern is code complexity and usability challenges, vs punting the problem to the operating system, though admittedly there are some cases where that is not possible. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
> Yeah, I guess we will just have to wait to see it since other people are > excited about it. My concern is code complexity and usability > challenges, vs punting the problem to the operating system, though > admittedly there are some cases where that is not possible. Oracle sells this feature only with the expensive enterprise edition. And people actually buy it. I guess the feature is pretty important for some users. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 16.06.2017 03:08, Bruce Momjian wrote: > > Yeah, I guess we will just have to wait to see it since other people are > excited about it. My concern is code complexity and usability > challenges, vs punting the problem to the operating system, though > admittedly there are some cases where that is not possible. > Let me also share my opinion about encryption and compression support at database level. PostgresPro Enterprise does support both. I have made presentation about it at PgConn 2016 in Tallinn. I was a little bit surprised that there were more questions about encryption than about compression. But right now we have several customers which are using compression and none of them use encryption (just because them do not need to protect their databases). But I absolutely sure that there are many Postgres users which first of all need to protect their data. Encryption is much easier to implement than compression, because it is not changing page size. So I do not see any "complexity and flexibility challenges" here. Just for reference I attached to this mail our own encryption patch. I do not want to propose it as alternative to Aasmas patch: it is less flexible and doesn't support encryption of WAL, just encryption of relation data. Also it doesn't allow custom encryption libraries: AES implementation is embedded. Encryption cipher is taken from environment variable. At Tallin's conferences I was informed about possible security issue with passing key through environment variable: it is possible to inspect server's environment variables using plpython/plperl stored procedure. This is why we unset this environment variable after reading. I am not expect in security, but I do not know other issues with such solution. Concerning the question whether to implement compression/encryption on database level or rely on OS, my opinion is that there are many scenarios where it is not possible or is not desirable to use OS level encryption/protection. It first of all includes cloud installations and embedded applications. I do not want to repeat arguments already mentioned in this thread. But the fact is that there are many people which really need compression/encryption support and them can not or do not want to redirect this aspects to OS. Almost all DBMSes are supporting compression encryption, so lack of this features in Postgres definitely can not be considered as Postgres advantage. Postgres buffer manager interface significantly simplifies integration of encryption and compression. There is actually single path through which data is fetched/stored to the disk. It is most obvious and natural solution to decompress/decrypt data when it is read from the disk to page pool and compress/encrypt it when it is written back. Taken in account that memory is cheap now and many databases can completely fit in memory, storing pages in the buffer cache in plain (decompressed/decrypted) format allows to minimize overhead of compression/encryption and its influence on performance. For read only queries working with cached data performance will be exactly the same as without encryption/compression. Write speed for encrypted pages will be certainly slightly worse, but still encryption speed is much higher than disk IO speed. So I do not think that it is really necessary to support encryption of some particular tables, storing "non-secrete" data in plain format without encryption. It should not cause noticeable improve of performance, but may complicate implementation and increase possibility of leaking secure data. I do not think that pluggable storage API is right approach to integrate compression and especially encryption. It is better to plugin encryption between buffer manager and storage device, allowing to use it with any storage implementation. Also it is not clear to me whether encryption of WAL can be provided using pluggable storage API. The last discussed question is whether it is necessary to encrypt temporary data (BufFile). In our solution we encrypt only main fork of non-system relations and do no encrypt temporary relations. It may cause that some secrete data will be stored at this disk in non-encrypted format. But accessing this data is not trivial. You can not just copy/stole disk, open database and do "select * from SecreteTable": you will have to extract data from raw file yourself. So looks like it is better to allow user to make choice whether to encrypt temporary data or not. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 15, 2017 at 08:08:05PM -0400, Bruce Momjian wrote: > On Thu, Jun 15, 2017 at 04:56:36PM -0700, Andres Freund wrote: > > how few concerns about this feature's complexity / maintainability > > impact have been raised. > > Yeah, I guess we will just have to wait to see it since other people are > excited about it. My concern is code complexity and usability > challenges, vs punting the problem to the operating system, though > admittedly there are some cases where that is not possible. I know some OS's can create file systems inside files. Can you encrypt such file storage as non-root? I assume that is just too odd. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Jun 16, 2017 at 11:06:39AM +0300, Konstantin Knizhnik wrote: > Encryption is much easier to implement than compression, because it is not > changing page size. So I do not see any "complexity and flexibility > challenges" here. > Just for reference I attached to this mail our own encryption patch. I do I didn't see you using CPU AES instructions, which can improve performance by 3-10x. Is there a reason? > Postgres buffer manager interface significantly simplifies integration of > encryption and compression. There is actually single path through which data > is fetched/stored to the disk. > It is most obvious and natural solution to decompress/decrypt data when it > is read from the disk to page pool and compress/encrypt it when it is > written back. Taken in account that memory is cheap now and many databases > can completely fit in memory, storing pages in the buffer cache in plain > (decompressed/decrypted) format allows to minimize overhead of > compression/encryption and its influence on performance. For read only > queries working with cached data performance will be exactly the same as > without encryption/compression. > Write speed for encrypted pages will be certainly slightly worse, but still > encryption speed is much higher than disk IO speed. Good point. > I do not think that pluggable storage API is right approach to integrate > compression and especially encryption. It is better to plugin encryption > between buffer manager and storage device, > allowing to use it with any storage implementation. Also it is not clear to > me whether encryption of WAL can be provided using pluggable storage API. Yes, you are completely correct. I withdraw my suggestion of doing it as plugin storage. > The last discussed question is whether it is necessary to encrypt temporary > data (BufFile). In our solution we encrypt only main fork of non-system > relations and do no encrypt temporary relations. It may cause that some > secrete data will be stored at this disk in non-encrypted format. But > accessing this data is not trivial. You can not just copy/stole disk, open > database and do "select * from SecreteTable": you will have to extract data > from raw file yourself. So looks like it is better to allow user to make > choice whether to encrypt temporary data or not. If we go forward with in-db encryption, I think we are going to have to have a discussion about what parts of PGDATA need to be encrypted, i.e., I don't think pg_clog needs encryption. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Hi all, I've noticed this thread got resurrected a few days ago, but I haven't managed to read all the messages until today. I do have a bunch of comments, but let me share them as a single consistent message instead of sending a thousand responses to individual messages. 1) Threat model --------------- Firstly, I think the thread would seriously benefit from an explanation and discussion of the threat model - what types of attacks it's meant to address, and what attacks it can't defend against. My understanding is that data-at-rest encryption generally addresses only the "someone stole the disk" case and pretty much nothing else. Moreover, I claim that encryption implemented at the database-level is strictly weaker compared to LUKS or encrypted disks, because it simply reveals a lot more information even without decryption (file sizes, timestamps, etc.) That is a serious issue in practice, and researches have been proving that for a long time now. I do recommend this paper from Cornell Tech as a great starting point (it cites many papers relevant to this thread): Why Your Encrypted Database Is Not Secure Paul Grubbs, Thomas Ristenpart, Vitaly Schmatikov http://eprint.iacr.org/2017/468.pdf The paper explains how encryption schemes on general-purpose databases fail, due to exactly such side-channels. MVCC, logging and other side channels turn all attackers into "persistent passive attackers". Now, this does not mean the feature is useless - nothing is perfect, and security is not a binary feature. It certainly makes attacks mode difficult compared to plaintext database. But it's untrue that it's basically LUKS, just implemented at the database level. I'm not suggesting that we should not pursue this idea, but the threat model is a crucial piece of information missing in this thread. 2) How do other databases do it? -------------------------------- It was repeatedly mentioned that other databases support this type of encryption. So how do they deal with the hard parts? For example how do they get and protect the encryption key? I agree with Stephen that we should not require a full key management from v1 of the patch, that's an incredibly difficult thing. And it largely depends on the hardware (e.g. it should be possible to move the key to TrustZone on ARM / SGX on Intel). 3) Why do users prefer this to FDE? ----------------------------------- I suppose we're discussing this feature because we've been asked about it by users/customers who can't use FDE. Can we reach to them and ask them about the reasons? Why can't they use FDE? It was mentioned in the thread that the FDE solutions are not portable between different systems, but honestly - is that an issue? You probably can't copy the datadir anyway due locale differences anyway. If you're running multiple operating systems, FDE is just one of many differences. 4) Other solutions? ------------------- Clearly, FDE (at the block device level) and DB-level encryption are not the only solutions. There are also filesystems-level solutions now, for example. ext4 (since kernel 4.1) and f2fs (since kernel 4.2) allow encryption at directory level, are transparent to the user space, and include things like key management (well, that's provided by kernel). NTFS can do something quite similar using EFS. https://lwn.net/Articles/639427/ https://blog.quarkslab.com/a-glimpse-of-ext4-filesystem-level-encryption.html Of course, if you happen to use another filesystem (e.g. XFS), this won't work for you. But then there's eCryptfs, for example: https://en.wikipedia.org/wiki/ECryptfs regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 15, 2017 at 7:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I thought we called it "incremental development". From the opposite > point of view, would you say we should ban use of passphrase-protected > SSL key files because the current user interface for them is bad? I think that we've got a number of features which exist in the tree today only because either (a) our standards were lower at the time that those features were committed than they are today or (b) we didn't realize how much trouble those features were going to create. Just because we don't want to hose the people already using those features does not mean that we want more features engineered to that same quality level. Obviously, there's room for debate in any particular case about how reasonable it is to expect someone who wants to implement A to also improve B, and, well, maybe handling thing as we do SSL certificates is good enough for this feature, too. I find myself a bit skeptical about that, though. It preclude as lot of things we might want to do. You're not going to be able to interface with some external key management server that way, nor do encryption of only part of the database, nor have multiple keys for different parts of the database using that kind of setup. One could argue that can all be added later, but I think there's a question about how much that's going to affect the code structure. Surely we don't want to install encryption v1 and then find that, by not considering key management, we've made it really hard to get to v2, and that it basically can't be done without ripping the whole implementation out and replacing it. Maybe the database needs, at some rather low level, a concept of whether the encryption key (or an encryption key) is available yet, and maybe you get out of considering that by deciding you're just going to prompt very early in startup, but now when someone comes along and wants to improve things later, and they've got to try to go find all of the code that depends on the assumption that the key is always available and fix it. That could be pretty annoying to validate. I think it's better to give at least some consideration to these key management questions from the beginning, lest we back ourselves into a corner. Whether or not the SSL-passphrase style implementation is above or below the level we'd consider a minimally viable feature, it's surely not where we want to end up, and we shouldn't do anything that makes it likely that we'll get stuck at precisely that point. Also, practically, I think that type of solution is going to be extremely difficult to use for most people. It means that the database can't be started by the system startup scripts; you'll have to log into the PG OS user account and launch the database from there. IIUC, that means it won't be able to be controlled by things like systemd, that just know about start and stop, but not about ask for a password in the middle. Maybe I'm wrong about that, though. And certainly, there will be some users for whom starting the database manually and prompting for a password will be good enough, if not ideal. But for people who want to fetch the encryption key from a key management server, which I bet is a lot of people, that's not really going to be good enough. I'm not really sure that rushing a first patch that "works" for sufficiently small values of "works" is actually going > I have no use for data-at-rest encryption myself, but I wouldn't stop > development just because the initial design proposal doesn't include > top-notch key management. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 19, 2017 at 12:30 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 15, 2017 at 7:51 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I thought we called it "incremental development". From the opposite >> point of view, would you say we should ban use of passphrase-protected >> SSL key files because the current user interface for them is bad? > > I think that we've got a number of features which exist in the tree > today only because either (a) our standards were lower at the time > that those features were committed than they are today or (b) we > didn't realize how much trouble those features were going to create. > Just because we don't want to hose the people already using those > features does not mean that we want more features engineered to that > same quality level. Obviously, there's room for debate in any > particular case about how reasonable it is to expect someone who wants > to implement A to also improve B, and, well, maybe handling thing as > we do SSL certificates is good enough for this feature, too. I find > myself a bit skeptical about that, though. It preclude as lot of > things we might want to do. You're not going to be able to interface > with some external key management server that way, nor do encryption > of only part of the database, nor have multiple keys for different > parts of the database using that kind of setup. > > One could argue that can all be added later, but I think there's a > question about how much that's going to affect the code structure. > Surely we don't want to install encryption v1 and then find that, by > not considering key management, we've made it really hard to get to > v2, and that it basically can't be done without ripping the whole > implementation out and replacing it. Maybe the database needs, at > some rather low level, a concept of whether the encryption key (or an > encryption key) is available yet, and maybe you get out of considering > that by deciding you're just going to prompt very early in startup, > but now when someone comes along and wants to improve things later, > and they've got to try to go find all of the code that depends on the > assumption that the key is always available and fix it. That could be > pretty annoying to validate. I think it's better to give at least > some consideration to these key management questions from the > beginning, lest we back ourselves into a corner. Whether or not the > SSL-passphrase style implementation is above or below the level we'd > consider a minimally viable feature, it's surely not where we want to > end up, and we shouldn't do anything that makes it likely that we'll > get stuck at precisely that point. > > Also, practically, I think that type of solution is going to be > extremely difficult to use for most people. It means that the > database can't be started by the system startup scripts; you'll have > to log into the PG OS user account and launch the database from there. > IIUC, that means it won't be able to be controlled by things like > systemd, that just know about start and stop, but not about ask for a > password in the middle. Maybe I'm wrong about that, though. And > certainly, there will be some users for whom starting the database > manually and prompting for a password will be good enough, if not > ideal. But for people who want to fetch the encryption key from a key > management server, which I bet is a lot of people, that's not really > going to be good enough. I'm not really sure that rushing a first > patch that "works" for sufficiently small values of "works" is > actually going ...to move us forward very much. >> I have no use for data-at-rest encryption myself, but I wouldn't stop >> development just because the initial design proposal doesn't include >> top-notch key management. I agree with that, but there's a difference between "not top-notch" and "pretty bad". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Ants Aasma <ants.aasma@gmail.com> wrote: > Attached to this mail is a work in progress patch that adds an > extensible encryption mechanism. There are some loose ends left to tie > up, but the general concept and architecture is at a point where it's > ready for some feedback, fresh ideas and bikeshedding. Rebased patch is attached here, in case it helps to achieve (some of) the goals mentioned in the related thread [1]. Besides encrypting table and WAL pages, it encrypts the temporary files (buffile.c), data stored during logical decoding (reorderbuffer.c) and statistics temporary files (pgstat.c). Unlike the previous version, SLRU files (e.g. CLOG) are not encrypted (it does not seem critical and the encryption makes torn page write quite difficult to handle). Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher now. Binary upgrade from unencrypted to encrypted cluster is not implemented yet. [1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Attachment
Hi. I am interested in a patch of "WIP: Data at rest encryption". This patch("data-at-rest-encryption-wip-2018.06.27.patch") is applied to PostgreSQL 11-beta 2 and it is running. In the explanation of this patch, since "data stored during logical decoding" is written, we tried logical decoding by the test_decoding module, but the following error occurs when creating a slot. pgbench_db=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding'); ERROR: invalid magic number B419 in log segment 000000020000000000000010, offset 0 pgbench_db=# (Also, if you run "CREATE SUNSCRIPTION" for logical replication from another server, a similar error will occur.) Question. In "data-at-rest-encryption-wip-2018.06.27.patch", is logical decoding still not implemented? Or is there a need for another logical decoding plug-in for "WIP: Data at rest encryption"? Regards. Antonin Houska <ah@cybertec.at> wrote: > Ants Aasma <ants.aasma@gmail.com> wrote: > > > Attached to this mail is a work in progress patch that adds an > > extensible encryption mechanism. There are some loose ends left to tie > > up, but the general concept and architecture is at a point where it's > > ready for some feedback, fresh ideas and bikeshedding. > > Rebased patch is attached here, in case it helps to achieve (some of) the > goals mentioned in the related thread [1]. > > Besides encrypting table and WAL pages, it encrypts the temporary files > (buffile.c), data stored during logical decoding (reorderbuffer.c) and > statistics temporary files (pgstat.c). Unlike the previous version, SLRU files > (e.g. CLOG) are not encrypted (it does not seem critical and the encryption > makes torn page write quite difficult to handle). > > Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher > now. > > Binary upgrade from unencrypted to encrypted cluster is not implemented yet. > > > [1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp > > -- > Antonin Houska > Cybertec Schonig & Schonig GmbH > Grohrmuhlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com >
Hi. I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it. When this patch is applied, the following problem occurs. * An error occurs when CHECKPOINT is executed during two-phase commit. * After an error occurs, if you stop PostgreSQL, it will never start again. (1) First, execute PREPARE TRANSACTION. postgres=# BEGIN; BEGIN postgres=# PREPARE TRANSACTION 'foo'; PREPARE TRANSACTION postgres=# (2) Execute the CHECKPOINT command from another terminal. CHEKPOINT command fails. postgres=# CHECKPOINT; ERROR: checkpoint request failed HINT: Consult recent messages in the server log for details. postgres=# (3) ROLLBACK PREPARED command also fails. postgres=# ROLLBACK PREPARED 'foo'; ERROR: could not read two-phase state from WAL at 0/167EBA0 postgres=# (4) Shut down the PostgreSQL server. During shutdown, a "could not read two-phase state from WAL" error occurs. 2018-07-23 14:49:08.924 JST [15821] LOG: received fast shutdown request 2018-07-23 14:49:08.925 JST [15821] LOG: aborting any active transactions 2018-07-23 14:49:08.925 JST [15831] FATAL: terminating connection due to administrator command 2018-07-23 14:49:08.928 JST [15821] LOG: background worker "logical replication launcher" (PID 15829) exited with exit code1 2018-07-23 14:49:08.928 JST [15824] LOG: shutting down 2018-07-23 14:49:08.935 JST [15824] FATAL: could not read two-phase state from WAL at 0/167EBA0 2018-07-23 14:49:08.936 JST [15821] LOG: checkpointer process (PID 15824) exited with exit code 1 2018-07-23 14:49:08.936 JST [15821] LOG: terminating any other active server processes 2018-07-23 14:49:08.937 JST [15821] LOG: abnormal database system shutdown 2018-07-23 14:49:08.945 JST [15821] LOG: database system is shut down (5) When restarting the PostgreSQL server, an error(could not read two-phase state from WAL) occurs and the PostgreSQL server can not be started. 2018-07-23 14:49:42.489 JST [15864] LOG: listening on IPv6 address "::1", port 5432 2018-07-23 14:49:42.489 JST [15864] LOG: listening on IPv4 address "127.0.0.1", port 5432 2018-07-23 14:49:42.492 JST [15864] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2018-07-23 14:49:42.521 JST [15866] LOG: database system shutdown was interrupted; last known up at 2018-07-23 14:49:08JST 2018-07-23 14:49:42.674 JST [15866] LOG: database system was not properly shut down; automatic recovery in progress 2018-07-23 14:49:42.676 JST [15866] LOG: redo starts at 0/167EB60 2018-07-23 14:49:42.676 JST [15866] LOG: invalid record length at 0/167EC70: wanted 24, got 0 2018-07-23 14:49:42.676 JST [15866] LOG: redo done at 0/167EC30 2018-07-23 14:49:42.677 JST [15866] FATAL: could not read two-phase state from WAL at 0/167EBA0 2018-07-23 14:49:42.678 JST [15864] LOG: startup process (PID 15866) exited with exit code 1 2018-07-23 14:49:42.678 JST [15864] LOG: aborting startup due to startup process failure 2018-07-23 14:49:42.682 JST [15864] LOG: database system is shut down Regards. ---- Harada Toshi. NTT TechnoCross Corporation Antonin Houska <ah@cybertec.at> wrote: > Ants Aasma <ants.aasma@gmail.com> wrote: > > > Attached to this mail is a work in progress patch that adds an > > extensible encryption mechanism. There are some loose ends left to tie > > up, but the general concept and architecture is at a point where it's > > ready for some feedback, fresh ideas and bikeshedding. > > Rebased patch is attached here, in case it helps to achieve (some of) the > goals mentioned in the related thread [1]. > > Besides encrypting table and WAL pages, it encrypts the temporary files > (buffile.c), data stored during logical decoding (reorderbuffer.c) and > statistics temporary files (pgstat.c). Unlike the previous version, SLRU files > (e.g. CLOG) are not encrypted (it does not seem critical and the encryption > makes torn page write quite difficult to handle). > > Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher > now. > > Binary upgrade from unencrypted to encrypted cluster is not implemented yet. > > > [1] https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp > > -- > Antonin Houska > Cybertec Schonig & Schonig GmbH > Grohrmuhlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com >
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > I am interested in a patch of "WIP: Data at rest encryption". > This patch("data-at-rest-encryption-wip-2018.06.27.patch") is applied to PostgreSQL 11-beta 2 and it is running. > > In the explanation of this patch, since "data stored during logical decoding" is written, > we tried logical decoding by the test_decoding module, but the following error occurs when creating a slot. > > > pgbench_db=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding'); > ERROR: invalid magic number B419 in log segment 000000020000000000000010, offset 0 > pgbench_db=# I could not reproduce this error ... > (Also, if you run "CREATE SUNSCRIPTION" for logical replication from another server, a similar error will occur.) ... but I succeeded for this. The problem was that a separate XLOG reader callback is used for logical replication and that did not decrypt the XLOG page. Fixed in the new version of the patch. Do you still see pg_create_logical_replication_slot() failing? > Question. > In "data-at-rest-encryption-wip-2018.06.27.patch", is logical decoding still not implemented? > Or is there a need for another logical decoding plug-in for "WIP: Data at rest encryption"? No, there's nothing special about logical decoding. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Attachment
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > Hi. > > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it. > > When this patch is applied, the following problem occurs. > > * An error occurs when CHECKPOINT is executed during two-phase commit. > * After an error occurs, if you stop PostgreSQL, it will never start again. > > (1) First, execute PREPARE TRANSACTION. > > postgres=# BEGIN; > BEGIN > postgres=# PREPARE TRANSACTION 'foo'; > PREPARE TRANSACTION > postgres=# > > (2) Execute the CHECKPOINT command from another terminal. > CHEKPOINT command fails. > > postgres=# CHECKPOINT; > ERROR: checkpoint request failed > HINT: Consult recent messages in the server log for details. > postgres=# The patch version I posted in https://www.postgresql.org/message-id/11678.1532519255%40localhost fixes an issue (unitialized pointer) that caused failure here, but it was SEGFAULT rather than ERROR. And the scope of the bug was broader than just CHECKPOINT. Can you please test it again with the new version of the patch? Anyway, thanks for your reports! -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Hi. Applying https://www.postgresql.org/message-id/11678.1532519255%40localhost patch, the problem of pg_create_logical_replication_slot () and the 2PC problem were solved. Thanks. Antonin Houska <ah@cybertec.at> wrote: > Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > > > Hi. > > > > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it. > > > > When this patch is applied, the following problem occurs. > > > > * An error occurs when CHECKPOINT is executed during two-phase commit. > > * After an error occurs, if you stop PostgreSQL, it will never start again. > > > > (1) First, execute PREPARE TRANSACTION. > > > > postgres=# BEGIN; > > BEGIN > > postgres=# PREPARE TRANSACTION 'foo'; > > PREPARE TRANSACTION > > postgres=# > > > > (2) Execute the CHECKPOINT command from another terminal. > > CHEKPOINT command fails. > > > > postgres=# CHECKPOINT; > > ERROR: checkpoint request failed > > HINT: Consult recent messages in the server log for details. > > postgres=# > > The patch version I posted in > > https://www.postgresql.org/message-id/11678.1532519255%40localhost > > fixes an issue (unitialized pointer) that caused failure here, but it was > SEGFAULT rather than ERROR. And the scope of the bug was broader than just > CHECKPOINT. > > Can you please test it again with the new version of the patch? > > Anyway, thanks for your reports! > > > -- > Antonin Houska > Cybertec Schonig & Schonig GmbH > Grohrmuhlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com >
Hi. If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week, patch execution will fail as follows. ---- patching file src/backend/replication/logical/reorderbuffer.c Hunk #9 FAILED at 2464. 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej 1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej ---- (See the attached file for the entire patch log) Antonin Houska <ah@cybertec.at> wrote: > Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > > > Hi. > > > > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 and I'm working on it. > > > > When this patch is applied, the following problem occurs. > > > > * An error occurs when CHECKPOINT is executed during two-phase commit. > > * After an error occurs, if you stop PostgreSQL, it will never start again. > > > > (1) First, execute PREPARE TRANSACTION. > > > > postgres=# BEGIN; > > BEGIN > > postgres=# PREPARE TRANSACTION 'foo'; > > PREPARE TRANSACTION > > postgres=# > > > > (2) Execute the CHECKPOINT command from another terminal. > > CHEKPOINT command fails. > > > > postgres=# CHECKPOINT; > > ERROR: checkpoint request failed > > HINT: Consult recent messages in the server log for details. > > postgres=# > > The patch version I posted in > > https://www.postgresql.org/message-id/11678.1532519255%40localhost > > fixes an issue (unitialized pointer) that caused failure here, but it was > SEGFAULT rather than ERROR. And the scope of the bug was broader than just > CHECKPOINT. > > Can you please test it again with the new version of the patch? > > Anyway, thanks for your reports! > > > -- > Antonin Houska > Cybertec Schonig & Schonig GmbH > Grohrmuhlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com >
Attachment
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > Hi. > > If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week, > patch execution will fail as follows. > ---- > patching file src/backend/replication/logical/reorderbuffer.c > Hunk #9 FAILED at 2464. > > 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej > > 1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej The patch should be applicable to the master branch. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Hi. Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch, the same patch error will occur. (Patch error of pg_rewind was eliminated) ---- patching file src/backend/replication/logical/reorderbuffer.c Hunk #9 FAILED at 2464. patching file src/bin/pg_upgrade/controldata.c Hunk #1 FAILED at 58. ---- (See the attached file for the entire patch log) Antonin Houska <ah@cybertec.at> wrote: > Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > > > Hi. > > > > If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 11-beta3 released last week, > > patch execution will fail as follows. > > > ---- > > patching file src/backend/replication/logical/reorderbuffer.c > > Hunk #9 FAILED at 2464. > > > > 1 out of 7 hunks FAILED -- saving rejects to file src/bin/pg_rewind/pg_rewind.c.rej > > > > 1 out of 6 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/controldata.c.rej > > The patch should be applicable to the master branch. > > -- > Antonin Houska > Cybertec Schonig & Schonig GmbH > Grohrmuhlgasse 26, A-2700 Wiener Neustadt > Web: https://www.cybertec-postgresql.com >
Attachment
Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch, > the same patch error will occur. The version attached to this email should be applicable to the current branch. Sorry for the delay. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Attachment
On Fri, Sep 14, 2018 at 10:04 AM Antonin Houska <ah@cybertec.at> wrote: > Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > > Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch, > > the same patch error will occur. > > The version attached to this email should be applicable to the current > branch. Sorry for the delay. I have a few comments on this patch. It doesn't seem useful to go through and make really detailed review comments on everything at this stage, because it's pretty far from being committable, at least IMHO. However, I have a few general thoughts. First of all, this is an impressive piece of work. It's obvious even from looking at this quickly that a lot of energy has gone into making this work, and it touches a lot of stuff, not without reason. For example, it's quite impressive that this touches things like the write-ahead log, logical decoding, and the stats collector, all of which write to disk. However, this is also quite invasive. It changes a lot of code and it doesn't do so in a very predictable way. It's not like you went through and replaced every call to write() with a call to SpecialEncyptionMagicWrite(). Rather, there are new #ifdef USE_ENCRYPTION blocks all over the code base. I think it will be important to look for ways to refactor this functionality to reduce that sort of thing to the absolute minimum possible. Encryption needs to be something that the code needs to know about here and there, but not everywhere. Some of the changes are things that could perhaps be done as separate, preparatory patches. For instance, pgstat.c gets a heavy refactoring to make it do I/O in bigger chunks. There's no reason that you couldn't separate some of those changes out, clean them up, and get them committed separately. It would be more efficient even as things stand. OK, well, there is one reason: we may be getting a shared-memory stats collector soon, and if we do, then the need for all of this will go away. But the general point is valid: whatever is a useful cleanup apart from this patch should be done separately from this patch. As far as possible, we need to try to do things in the same way in the encrypted and non-encrypted cases. For example, it's pretty hard to feel good about code like this: + sz_hdr = sizeof(ReorderBufferDiskChange); + if (data_encrypted) +#ifdef USE_ENCRYPTION + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr); +#else + ENCRYPTION_NOT_SUPPORTED_MSG; +#endif /* USE_ENCRYPTION */ You won't be able to have much confidence that this code works both with and without encryption unless you test it both ways, because the file format is different in those two cases, and not just by being encrypted. That means that anyone who modifies this code in the future has two ways of breaking it. They can break the normal case, or they can break the encrypted case. If encryption were available as a sort of service that everyone could use, then that would probably be fine, but like I said above, I think something as invasive as this currently is will lead to a lot of complaints. The code needs a lot more documentation, not only SGML documentation but also code comments and probably a README file someplace. There is a lot of discussion in the comments here of encryption tweaks, but there's no general introduction to the topic (what's a tweak?) or -- and I think this is important -- what the idea between the choice of various tweaks in different places actually is. There's probably some motivating philosophy behind the way the tweaks are chosen that should be explained someplace. Think about it this way: if some hapless developer, let's say me, wants to add a new module to PostgreSQL which happens to write data to disk, that person needs an easy way to understand what they need to do to make that new code play nice with encryption. Right now, the code for every existing module is a little different (uggh) and there's this encryption tweak stuff that you about which you have to do something intelligent, but there's nothing that tells you how to be intelligent about it, at least not that I see. People are going to need something that looks more like a formula that they can just copy, and a good introduction to the principles in some appropriate place, too. The interaction of this capability with certain tricks that PostgreSQL plays needs some thought -- and documentation, at least developer documentation, maybe user documentation. One area is recovery. Writing WAL relies on the fact that if you are in the midst of rewriting a block and the power fails, bytes that are the same in both the old and new block images will be undisturbed; encryption will make that false. How's that going to work? Hint bits also rely on this principle. I thought there might be some interaction between this work and wal_log_hints for this reason, but I see nothing of that sort in the patch. Full page writes seem related too; I don't know how something like this can be safe without both full_page_writes and wal_log_hints forced on. I don't know whether using encryption should forcibly override those settings or whether it should just refuse to start unless they are set properly, but I think it probably needs to do one or the other. Hope this helps in some way. I think it would be good if this effort went forward in some way. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks for feedback! Robert Haas <robertmhaas@gmail.com> wrote: > However, this is also quite invasive. It changes a lot of code and it > doesn't do so in a very predictable way. It's not like you went > through and replaced every call to write() with a call to > SpecialEncyptionMagicWrite(). Rather, there are new #ifdef > USE_ENCRYPTION blocks all over the code base. I think it will be > important to look for ways to refactor this functionality to reduce > that sort of thing to the absolute minimum possible. Encryption needs > to be something that the code needs to know about here and there, but > not everywhere. The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code needs to compile even w/o OpenSSL (of course the encryption won't be enabled in that case). I'll try to reduce the use of this construct only to the code blocks that really reference the OpenSSL functions. > Some of the changes are things that could perhaps be done as separate, > preparatory patches. For instance, pgstat.c gets a heavy refactoring > to make it do I/O in bigger chunks. There's no reason that you > couldn't separate some of those changes out, clean them up, and get > them committed separately. It would be more efficient even as things > stand. OK, well, there is one reason: we may be getting a > shared-memory stats collector soon, and if we do, then the need for > all of this will go away. But the general point is valid: whatever is > a useful cleanup apart from this patch should be done separately from > this patch. We'll omit the stats collector related changes so far because the patch [1] is under active development, and that would require us to rebase our patch often. And if the stats files will eventually go away, we won't need to encrypt the statistics. > As far as possible, we need to try to do things in the same way in the > encrypted and non-encrypted cases. For example, it's pretty hard to > feel good about code like this: > > + sz_hdr = sizeof(ReorderBufferDiskChange); > + if (data_encrypted) > +#ifdef USE_ENCRYPTION > + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr); > +#else > + ENCRYPTION_NOT_SUPPORTED_MSG; > +#endif /* USE_ENCRYPTION */ > > You won't be able to have much confidence that this code works both > with and without encryption unless you test it both ways, because the > file format is different in those two cases, and not just by being > encrypted. That means that anyone who modifies this code in the > future has two ways of breaking it. They can break the normal case, > or they can break the encrypted case. If encryption were available as > a sort of service that everyone could use, then that would probably be > fine, but like I said above, I think something as invasive as this > currently is will lead to a lot of complaints. The problem I tried to solve here is that the whole encryption block (16 bytes) needs to be read and decrypted even if you need only part of its data. In the snippet above I tried to ensure that the whole blocks are always read, but I agree this approach is fragile. Since buffile.c needs to handle this kind of problem (and it already does in the last patch version), I think even other than temporary files could be handled by this module. The patch below adds functions BufFileOpenTransient(), BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(), etc. respectively. Once we implement the encryption, these functions will also hide handling of the encryption blocks from user. In this (preliminary) patch As an example, I applied this approach to ReorderBufferSerializeChange() and the function seems a bit simpler now. (ReorderBufferRestoreChange() would need more work to adopt this approach.) > The code needs a lot more documentation, not only SGML documentation > but also code comments and probably a README file someplace. ok, we'll do that. > The interaction of this capability with certain tricks that PostgreSQL > plays needs some thought -- and documentation, at least developer > documentation, maybe user documentation. One area is recovery. > Writing WAL relies on the fact that if you are in the midst of > rewriting a block and the power fails, bytes that are the same in both > the old and new block images will be undisturbed; encryption will make > that false. How's that going to work? Yes, if only a single bit is different, decryption will turn the whole encryption block (16 bytes) into garbage. So we need to enforce full_page_writes=on if encryption is enabled. The last version of the patch does not do that. > Hint bits also rely on this principle. I thought there might be some > interaction between this work and wal_log_hints for this reason, but I see > nothing of that sort in the patch. I'm not sure if the hint bit is still a problem if we first copy the shared buffer to backend-local memory and encrypt it there. That's what the patch does. > Full page writes seem related too; I don't know how something like this can > be safe without both full_page_writes and wal_log_hints forced on. I don't > know whether using encryption should forcibly override those settings or > whether it should just refuse to start unless they are set properly, but I > think it probably needs to do one or the other. Maybe it's more polite to refuse to start so that user knows what's going on. I'm not sure if PG ever changes any configuration variable forcibly. > Hope this helps in some way. I think it would be good if this effort > went forward in some way. Sure, it helps! Thanks. [1] https://commitfest.postgresql.org/22/1708/ -- Antonin Houska https://www.cybertec-postgresql.com
Attachment
On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <ah@cybertec.at> wrote: > The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code > needs to compile even w/o OpenSSL (of course the encryption won't be enabled > in that case). I'll try to reduce the use of this construct only to the code > blocks that really reference the OpenSSL functions. That sounds like a good place to start. The other part will be avoiding referencing the OpenSSL functions from too many places. For example, right now we have secure_read() -- which I know some people hate, or at least hate the name of] -- which lets you send data to the client without having to care whether that data is going to be SSL-encrypted on its way over the network. One can argue about whether that particular abstraction is properly-designed, but as Tom recently felt the urge to remind me on another thread, the value of abstraction in general is not in doubt. I think we need one here, so that the SSL-specific bits can be isolated in a relatively small number of files, and then clients can ask to write a file, or write a block to a file, or whatever, and ignore the question of whether that is going to get encrypted somehow behind the scenes. > Since buffile.c needs to handle this kind of problem (and it already does in > the last patch version), I think even other than temporary files could be > handled by this module. The patch below adds functions BufFileOpenTransient(), > BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(), > etc. respectively. Once we implement the encryption, these functions will also > hide handling of the encryption blocks from user. In this (preliminary) patch > As an example, I applied this approach to ReorderBufferSerializeChange() and > the function seems a bit simpler now. (ReorderBufferRestoreChange() would need > more work to adopt this approach.) Yeah, I don't have time to look at this in detail right now, but this kind of thing seems like a promising approach to me. > > The interaction of this capability with certain tricks that PostgreSQL > > plays needs some thought -- and documentation, at least developer > > documentation, maybe user documentation. One area is recovery. > > Writing WAL relies on the fact that if you are in the midst of > > rewriting a block and the power fails, bytes that are the same in both > > the old and new block images will be undisturbed; encryption will make > > that false. How's that going to work? > > Yes, if only a single bit is different, decryption will turn the whole > encryption block (16 bytes) into garbage. So we need to enforce > full_page_writes=on if encryption is enabled. The last version of the patch > does not do that. I believe that we need full_page_writes=on and wal_log_hints=on, but they don't fix the problem that I'm describing. Suppose that the very last page in the write-ahead log contains 3.5kB of good data and 7kB of zeroes. Now, somebody comes along and writes a new write-ahead log record that is 1k in size. In memory, we update the block: it now contains 4.5kB of good data and 6.5kB of zeroes. Now, as we're writing the block to do disk, the power is removed. Consequently, the first 4kB hits the platter and the other 4kB disappears into the void. If the WAL is not encrypted, the state at this point is that the 3.5kB of good data that we had previously is still there, and the first 500 bytes of the new WAL record are also there, and the rest is missing. Recovery will detect that there's a valid-looking record at end of WAL, but when it tries to checksum that record, it will fail, and so that trailing half-record will just get ignored. And that's fine. If the WAL *is* encrypted, the state at this point is that the block is unreadable, because the first 4kB of the block is the first half of the bits that resulted from encrypting 8kB of data that includes the new record, and the second 4kB of the block is the second half of the bits that resulted from encrypting 8kB of data that did not include the new record, and since the new record perturbed every bit on the page with probability ~50%, what that means is you now have garbage. That means that not only did we lose the new record, but we also lost the 3.5kB of good data which the page previously contained. That's NOT ok. Some of the changes associated with those WAL records may have been flushed to disk already, or there may be commits in there that were acknowledged to the client, and we can't just lose them. One idea I have for fixing this problem is to encrypt the individual WAL records rather than the blocks. That leaks some information, of course, but maybe not in a way that really matters. > > Hint bits also rely on this principle. I thought there might be some > > interaction between this work and wal_log_hints for this reason, but I see > > nothing of that sort in the patch. > > I'm not sure if the hint bit is still a problem if we first copy the shared > buffer to backend-local memory and encrypt it there. That's what the patch > does. That has the same problem that I described in the previous paragraph, except for the relation data files rather than the WAL itself. See the manual's description of wal_log_hints: <para> When this parameter is <literal>on</literal>, the <productname>PostgreSQL</productname> server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits. </para> For encryption to work, you need that. A hint bit write might convert the page to garbage, just as in the previous example, and this option make sure that if that happens, there will be an FPW, allowing you to recover... unless of course you also hit the problem I described above, but assuming that's fixed somehow you still have this issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > If the WAL *is* encrypted, the state at this point is that the block > is unreadable, because the first 4kB of the block is the first half of > the bits that resulted from encrypting 8kB of data that includes the > new record, and the second 4kB of the block is the second half of the > bits that resulted from encrypting 8kB of data that did not include > the new record, and since the new record perturbed every bit on the > page with probability ~50%, what that means is you now have garbage. > That means that not only did we lose the new record, but we also lost > the 3.5kB of good data which the page previously contained. That's > NOT ok. Some of the changes associated with those WAL records may > have been flushed to disk already, or there may be commits in there > that were acknowledged to the client, and we can't just lose them. ISTM that this is only a problem if you choose the wrong encryption method. One not-wrong encryption method is to use a stream cipher --- maybe that's not the exact right technical term, but anyway I'm talking about a method which notionally XOR's the cleartext data with a random bit stream generated from the encryption key (probably along with other knowable inputs such as the block number). In such a method, corruption of individual on-disk bytes doesn't prevent you from getting the correct decryption of on-disk bytes that aren't corrupted. regards, tom lane
On Fri, Mar 15, 2019 at 5:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ISTM that this is only a problem if you choose the wrong encryption > method. One not-wrong encryption method is to use a stream cipher > --- maybe that's not the exact right technical term, but anyway > I'm talking about a method which notionally XOR's the cleartext > data with a random bit stream generated from the encryption key > (probably along with other knowable inputs such as the block number). > In such a method, corruption of individual on-disk bytes doesn't > prevent you from getting the correct decryption of on-disk bytes > that aren't corrupted. Oh, that seems like it might be a really good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > If the WAL *is* encrypted, the state at this point is that the block > > is unreadable, because the first 4kB of the block is the first half of > > the bits that resulted from encrypting 8kB of data that includes the > > new record, and the second 4kB of the block is the second half of the > > bits that resulted from encrypting 8kB of data that did not include > > the new record, and since the new record perturbed every bit on the > > page with probability ~50%, what that means is you now have garbage. > > That means that not only did we lose the new record, but we also lost > > the 3.5kB of good data which the page previously contained. That's > > NOT ok. Some of the changes associated with those WAL records may > > have been flushed to disk already, or there may be commits in there > > that were acknowledged to the client, and we can't just lose them. > > ISTM that this is only a problem if you choose the wrong encryption > method. One not-wrong encryption method is to use a stream cipher > --- maybe that's not the exact right technical term, but anyway > I'm talking about a method which notionally XOR's the cleartext > data with a random bit stream generated from the encryption key > (probably along with other knowable inputs such as the block number). > In such a method, corruption of individual on-disk bytes doesn't > prevent you from getting the correct decryption of on-disk bytes > that aren't corrupted. We actually use a block cipher (with block size 16 bytes), as opposed to stream cipher. It's true that partial write is a problem because if a single bit of the cipher text changed, decryption will produce 16 bytes of garbage. However I'm not sure if partial write can affect as small unit as 16 bytes. Nevertheless, with the current version of our patch, PG should be resistant against such a partial write anyway because we chose to align XLOG records to 16 bytes (as long as the encryption is enabled) for the following reasons: If one XLOG record ends and the following one starts in the same encryption block, both records can get corrupted during streaming replication. The scenario looks like: 1) the first record is written on master (the unused part of the block contains zeroes), 2) the block is encrypted and its initial part (i.e. the number of bytes occupied by the first record in the plain text) is streamed to slave, 3) the second record is written on master, 4) the containing encryption block is encrypted again and the trailing part (i.e. the number of bytes occupied by the second record) is streamed, 5) decryption of the block on slave will produce garbage and thus corrupt both records. This is because the trailing part of the block was filled with zeroes during encryption, but it contains different data at decryption time. Alternative approach to this replication problem is that walsender decrypts the stream and walreceiver encrypts it again. While this can provide us with the advantage to have master and slave encrypted with different keys, this approach brings some additional complexity. For example, pg_basebackup would need to deal with encryption. This design decision can be changed, but there's one more thing to consider: if the XLOG stream is decrypted, the decryption cannot be disabled unless the XLOG records are aligned to 16 bytes (and in turn, the XLOG alignment cannot be enabled w/o initdb). -- Antonin Houska https://www.cybertec-postgresql.com
On Thu, Mar 21, 2019 at 7:46 AM Antonin Houska <ah@cybertec.at> wrote: > Nevertheless, with the current version of our patch, PG should be resistant > against such a partial write anyway because we chose to align XLOG records to > 16 bytes (as long as the encryption is enabled) for the following reasons: > > If one XLOG record ends and the following one starts in the same encryption > block, both records can get corrupted during streaming replication. The > scenario looks like: 1) the first record is written on master (the unused part > of the block contains zeroes), 2) the block is encrypted and its initial part > (i.e. the number of bytes occupied by the first record in the plain text) is > streamed to slave, 3) the second record is written on master, 4) the > containing encryption block is encrypted again and the trailing part (i.e. the > number of bytes occupied by the second record) is streamed, 5) decryption of > the block on slave will produce garbage and thus corrupt both records. This is > because the trailing part of the block was filled with zeroes during > encryption, but it contains different data at decryption time. Wouldn't Tom's proposal to use a stream cipher fix all this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <ah@cybertec.at> wrote: > > > Hint bits also rely on this principle. I thought there might be some > > > interaction between this work and wal_log_hints for this reason, but I see > > > nothing of that sort in the patch. > > > > I'm not sure if the hint bit is still a problem if we first copy the shared > > buffer to backend-local memory and encrypt it there. That's what the patch > > does. > > That has the same problem that I described in the previous paragraph, > except for the relation data files rather than the WAL itself. See > the manual's description of wal_log_hints: > > <para> > When this parameter is <literal>on</literal>, the > <productname>PostgreSQL</productname> > server writes the entire content of each disk page to WAL during the > first modification of that page after a checkpoint, even for > non-critical modifications of so-called hint bits. > </para> > > For encryption to work, you need that. A hint bit write might convert > the page to garbage, just as in the previous example, and this option > make sure that if that happens, there will be an FPW, allowing you to > recover... I think I finally understand. Originally I thought the question is how to compute correct page checksum while the hint bits can be changed w/o exclusive lock on the buffer. Now I realize that it's more about *recovery*: if the hint bit change is followed by a torn page write, the hint bit can get changed on disk but the checksum might not get updated. The wrong checksum is detected during recovery, but if XLOG does not contain the corresponding full page image, we're not able to recover. And with encryption, the consequence is even worse because torn page write causes not only wrong checksum of otherwise useful page, but really damaged page. I'll enforce the FPW in the next version of the patch. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska <ah@cybertec.at> wrote: > I think I finally understand. Originally I thought the question is how to > compute correct page checksum while the hint bits can be changed w/o exclusive > lock on the buffer. Now I realize that it's more about *recovery*: if the hint > bit change is followed by a torn page write, the hint bit can get changed on > disk but the checksum might not get updated. The wrong checksum is detected > during recovery, but if XLOG does not contain the corresponding full page > image, we're not able to recover. > > And with encryption, the consequence is even worse because torn page write > causes not only wrong checksum of otherwise useful page, but really damaged > page. Correct. > I'll enforce the FPW in the next version of the patch. Cool. I'm willing to put some effort into trying to get this into v13 if you're willing to keep hacking on it, but there's probably a fair amount to do and a year can go by in a hurry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > I'm willing to put some effort into trying to get this into v13 if > you're willing to keep hacking on it, but there's probably a fair > amount to do and a year can go by in a hurry. That'd be appreciated, especially by my boss. It doesn't seem likely that in this situation I'll be assigned many other tasks of higher priority. So yes, I expect to have quite some time for this patch. Thanks. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
From
PostgreSQL - Hans-Jürgen Schönig
Date:
On 4/4/19 5:52 PM, Antonin Houska wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I'm willing to put some effort into trying to get this into v13 if >> you're willing to keep hacking on it, but there's probably a fair >> amount to do and a year can go by in a hurry. > That'd be appreciated, especially by my boss. It doesn't seem likely that in > this situation I'll be assigned many other tasks of higher priority. So yes, I > expect to have quite some time for this patch. Thanks. > if there is any chance of having this committed into core, we will of course provide the capacity to hack on this. the same of course applies to the optimizer patches tony has been working on. many thanks hans -- Hans-Jürgen Schönig Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
On Thu, Apr 4, 2019 at 11:52 AM Antonin Houska <ah@cybertec.at> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > I'm willing to put some effort into trying to get this into v13 if > > you're willing to keep hacking on it, but there's probably a fair > > amount to do and a year can go by in a hurry. > > That'd be appreciated, especially by my boss. It doesn't seem likely that in > this situation I'll be assigned many other tasks of higher priority. So yes, I > expect to have quite some time for this patch. Thanks. Great. I think it will be appreciated by my boss, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Mar 21, 2019 at 7:46 AM Antonin Houska <ah@cybertec.at> wrote: > > Nevertheless, with the current version of our patch, PG should be resistant > > against such a partial write anyway because we chose to align XLOG records to > > 16 bytes (as long as the encryption is enabled) for the following reasons: > > > > If one XLOG record ends and the following one starts in the same encryption > > block, both records can get corrupted during streaming replication. The > > scenario looks like: 1) the first record is written on master (the unused part > > of the block contains zeroes), 2) the block is encrypted and its initial part > > (i.e. the number of bytes occupied by the first record in the plain text) is > > streamed to slave, 3) the second record is written on master, 4) the > > containing encryption block is encrypted again and the trailing part (i.e. the > > number of bytes occupied by the second record) is streamed, 5) decryption of > > the block on slave will produce garbage and thus corrupt both records. This is > > because the trailing part of the block was filled with zeroes during > > encryption, but it contains different data at decryption time. > > Wouldn't Tom's proposal to use a stream cipher fix all this? Yes it would make the extra alignment unnecessary, but our solution tries to meet specific requirements of disk encryption. Stream cipher appears to be incompatible with these requirements: https://en.wikipedia.org/wiki/Disk_encryption_theory Currently we try to use the CBC-ESSIV mode. It's worth to mention that in this mode, if the page is encrypted twice and if the plain data in the encryption block N (i.e. block of 16 bytes) changes before the 2nd encryption, the encrypted data will only change in blocks >= N. So the problem of losing already flushed WAL records should not happen. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote: > > Wouldn't Tom's proposal to use a stream cipher fix all this? > > Yes it would make the extra alignment unnecessary, but our solution tries to > meet specific requirements of disk encryption. Stream cipher appears to be > incompatible with these requirements: > > https://en.wikipedia.org/wiki/Disk_encryption_theory Hmm. Well, I don't know what to do about that, but I think this patch is going to be facing an uphill battle if the encrypted and unencrypted WAL formats use different alignment. > Currently we try to use the CBC-ESSIV mode. It's worth to mention that in this > mode, if the page is encrypted twice and if the plain data in the encryption > block N (i.e. block of 16 bytes) changes before the 2nd encryption, the > encrypted data will only change in blocks >= N. So the problem of losing > already flushed WAL records should not happen. Well, this is just a question of alignment. If WAL records are at least 16-byte aligned, then it should be fine. But I have a feeling they may just be MAXALIGN'd. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote: > > > Wouldn't Tom's proposal to use a stream cipher fix all this? > > > > Yes it would make the extra alignment unnecessary, but our solution tries to > > meet specific requirements of disk encryption. Stream cipher appears to be > > incompatible with these requirements: > > > > https://en.wikipedia.org/wiki/Disk_encryption_theory > > Hmm. Well, I don't know what to do about that, but I think this patch > is going to be facing an uphill battle if the encrypted and > unencrypted WAL formats use different alignment. Once you said this could be a problem, I thought about it a bit more. The link above tells that stream cipher is not good for disk encryption because it's risky to encrypt different data (e.g. old an new version of disk sector) with the same key (or rather with the same combination of key and initialization vector). However WAL is a special case because here we should never need to change the data at given position. The only exception is that unused space (sequence of zeroes) becomes a valid record. So if we avoid encryption of the unused space, it might be o.k. to use the stream cipher for WAL. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Sat, Apr 6, 2019 at 03:29:13PM +0200, Antonin Houska wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > > On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska <ah@cybertec.at> wrote: > > > > Wouldn't Tom's proposal to use a stream cipher fix all this? > > > > > > Yes it would make the extra alignment unnecessary, but our solution tries to > > > meet specific requirements of disk encryption. Stream cipher appears to be > > > incompatible with these requirements: > > > > > > https://en.wikipedia.org/wiki/Disk_encryption_theory > > > > Hmm. Well, I don't know what to do about that, but I think this patch > > is going to be facing an uphill battle if the encrypted and > > unencrypted WAL formats use different alignment. > > Once you said this could be a problem, I thought about it a bit more. The link > above tells that stream cipher is not good for disk encryption because it's > risky to encrypt different data (e.g. old an new version of disk sector) with > the same key (or rather with the same combination of key and initialization > vector). > > However WAL is a special case because here we should never need to change the > data at given position. The only exception is that unused space (sequence of > zeroes) becomes a valid record. So if we avoid encryption of the unused space, > it might be o.k. to use the stream cipher for WAL. Agreed. Requirement #2 is not needed for WAL: https://en.wikipedia.org/wiki/Disk_encryption_theory#Problem_definition 2. Data retrieval and storage should both be fast operations, no matter where on the disk the data is stored. because you are not writing into random parts of the WAL. Using the same stream cipher system that SSH uses would make sense, since it is byte-oriented, and it only encrypts the tail. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska <ah@cybertec.at> wrote: > > I'll enforce the FPW in the next version of the patch. > > Cool. > > I'm willing to put some effort into trying to get this into v13 if > you're willing to keep hacking on it, but there's probably a fair > amount to do and a year can go by in a hurry. Attached is the next version. It tries to address various problems pointed out upthread, including documentation. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Attachment
On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote: > Attached is the next version. It tries to address various problems pointed out > upthread, including documentation. It's not immediately evident to me, perhaps because I haven't looked at this stuff in a while, what the purpose of 0001 and 0002 is. I think it'd be helpful if you would generate these patches with 'git format-patch' and include a meaningful commit message in each patch of the series. + <para> + This setting specifies path to executable file that returns + either <literal>encryption_key=</literal> for key + and <literal>encryption_password=</literal> for password. + </para> This is incomprehensible. Executables don't return strings. Also, you can return "either a or b" or you can return "both a and b," but you can't return "either a and b". - Build with support for <acronym>SSL</acronym> (encrypted) - connections. This requires the <productname>OpenSSL</productname> - package to be installed. <filename>configure</filename> will check - for the required header files and libraries to make sure that + Build with support for on-disk data encryption + or <acronym>SSL</acronym> (encrypted) connections. This requires + the <productname>OpenSSL</productname> package to be + installed. <filename>configure</filename> will check for the + required header files and libraries to make sure that I think we should instead just remove the word 'connections'. We don't want to have multiple places in the documentation listing all the things for which OpenSSL is required. Generally, the documentation changes here have a lot of stuff about key-or-password, which is quite confusing. It seems like the idea is that you accept either a key or a password from which a key is derived, but I don't think that's explained super-clearly, and I wonder whether it's unnecessary complexity. Why not just require a key always? Another general point is that the documentation, comments, and README need some work on the English. They contain lots of information, but they read somewhat awkwardly and are hard to understand in some places. Wherever you are adding a large hunk of code into the middle of an existing function (e.g. XLogWrite), please consider putting the logic in a new function and calling it from the original function, instead of just inlining the entire thing. xloginsert.c adds a new #include even though there are no other changes to that file. That should not be necessary. If it is, you've broken the principle the header files are compilable independently of other header files (except postgres.h, which is a prerequisite for every header file). -XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, - Size count) +XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, Size count) Spurious hunk. Please try to avoid unnecessary changes. + * XXX Currently the function is only called with startptr at page + * boundary. If it should change, encryption needs to reflect this fact, + * i.e. read data from the beginning of the page. Actually this is a + * reason to adjust and use walsender.c:XLogRead(). This comment and the similar comment at the end of the function are not very clear. It sorta sounds, though, like maybe the decryption logic should move from here to the caller or something. You seem to be saying that, after your changes, this function depends on the caller using it in a certain way, which is often a sign that you haven't abstracted things in the right way. And if walsender.c's version of the logic is better, why not also do that stuff here? @@ -52,7 +54,6 @@ uint32 bootstrap_data_checksum_version = 0; /* No checksum */ - #define ALLOC(t, c) \ ((t *) MemoryContextAllocZero(TopMemoryContext, (unsigned)(c) * sizeof(t))) Another useless hunk. + { + RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid}; + RelFileNode toNode = {dsttablespace, dboid, InvalidOid}; + + copydir(srcpath, dstpath, &fromNode, &toNode); + } I suggest instead declaring these variables in the surrounding scope and initializing them here. This style is unlike most PostgreSQL code. + case WAIT_EVENT_KDF_FILE_READ: + event_name = "KDFFileRead"; + break; New wait events need documentation. +are encrypted differently. Furthermore, as the Postgres page starts with LSN, +even if only the last encryption block of the page changes, the whole cipher +text of the page will be different after the next encryption. But for a hint-bit-only change, this would not be true, unless we force wal_log_hints. Sounds like we're doing that, but it might be worth mentioning it here also. There's also the question of unlogged tables, which do not have an LSN. This logic wouldn't apply there, which might be a problem. +If the encryption is enabled, the following requirements need to be taken into +account: + +1. The file buffer cannot be positioned at arbitrary offset of the file. If I think it's no good at all to think about enforcing requirements like this only when data_encryption is used. That's just going to lead to bugs. I think we need to either find a way to mask this restriction from the users of the buffile facility, or else revise all those users so that they respect this new rule and enforce it with an unconditional assert. That brings me to another point, which is that a complex patch like this really needs to be divided up into smaller patches for review. I see you've started that a little bit (but see comments on commit messages above) but you should try to go further with it. For example, changing the buffile contract so that there is some new rule about positioning should be done as a separate patch (or not done at all). That patch should be written so that it applies before the main encryption patch, have comments and asserts explaining the new rules, etc. +To store other kinds of data encrypted than the ones above, developers are +advised to use BufFileWriteTransient() and BufFileReadTransient() functions The previous paragraph is about BufFileWrite() and BufFileRead(), and that section is titled "temporary files." Now here you say that for other kinds of data (not relations, xlog, or temporary files) we should use BufFileWriteTransient(). That seems pretty weird, because data that it is not temporary files would presumably be, uh, permanent files? So then why is Transient in the name? I imagine there are good reasons here but it needs to be made clearer. +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with +the --link option.) That seems pretty unfortunate. Any way to do better? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks of another round of review. Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote: > > Attached is the next version. It tries to address various problems pointed out > > upthread, including documentation. > > It's not immediately evident to me, perhaps because I haven't looked > at this stuff in a while, what the purpose of 0001 and 0002 is. I > think it'd be helpful if you would generate these patches with 'git > format-patch' and include a meaningful commit message in each patch of > the series. ok, will do. > + <para> > + This setting specifies path to executable file that returns > + either <literal>encryption_key=</literal> for key > + and <literal>encryption_password=</literal> for password. > + </para> > > This is incomprehensible. Executables don't return strings. That's right, the key should be written to the standard output. > Also, you can return "either a or b" or you can return "both a and b," but > you can't return "either a and b". Sure, this is a thinko. > - Build with support for <acronym>SSL</acronym> (encrypted) > - connections. This requires the <productname>OpenSSL</productname> > - package to be installed. <filename>configure</filename> will check > - for the required header files and libraries to make sure that > + Build with support for on-disk data encryption > + or <acronym>SSL</acronym> (encrypted) connections. This requires > + the <productname>OpenSSL</productname> package to be > + installed. <filename>configure</filename> will check for the > + required header files and libraries to make sure that > > I think we should instead just remove the word 'connections'. We > don't want to have multiple places in the documentation listing all > the things for which OpenSSL is required. ok > Generally, the documentation changes here have a lot of stuff about > key-or-password, which is quite confusing. It seems like the idea is > that you accept either a key or a password from which a key is > derived, but I don't think that's explained super-clearly, and I > wonder whether it's unnecessary complexity. Why not just require a > key always? We thought that it's more convenient for administrator to enter password. To achieve that, we can instead tell the admin how to use the key derivation tool (pg_keysetup) and send its output to postgres. So yes, it's possible to always receive the key from the "encryption key command". I'll change this. > Another general point is that the documentation, comments, and README > need some work on the English. They contain lots of information, but > they read somewhat awkwardly and are hard to understand in some > places. ok, I'll try to explain the tricky things in simple senteces. > Wherever you are adding a large hunk of code into the middle of an > existing function (e.g. XLogWrite), please consider putting the logic > in a new function and calling it from the original function, instead > of just inlining the entire thing. ok > xloginsert.c adds a new #include even though there are no other > changes to that file. That should not be necessary. If it is, you've > broken the principle the header files are compilable independently of > other header files (except postgres.h, which is a prerequisite for > every header file). It's not intentional, I failed to remove it when moving some encryption related function calls elsewhere. > -XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, > - Size count) > +XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, > Size count) > Spurious hunk. Please try to avoid unnecessary changes. Maybe I added and later removed some argument and forgot to enforce the maximum line length. I expected pg_indent to fix things like this but it apparently does not. I'll pay more attention to such things next time. > + * XXX Currently the function is only called with startptr at page > + * boundary. If it should change, encryption needs to reflect this fact, > + * i.e. read data from the beginning of the page. Actually this is a > + * reason to adjust and use walsender.c:XLogRead(). > > This comment and the similar comment at the end of the function are > not very clear. It sorta sounds, though, like maybe the decryption > logic should move from here to the caller or something. You seem to > be saying that, after your changes, this function depends on the > caller using it in a certain way, which is often a sign that you > haven't abstracted things in the right way. And if walsender.c's > version of the logic is better, why not also do that stuff here? Please consider this an incomplete implementation of the XLOG decryption. Eventually these restrictions will be relaxed and the decryption will be hidden from caller. Currently there are two implementations of XLogRead() in the backend (plus one in the frontend) and I didn't want to spend too much effort on merging the decryption into core until this another patch gets committed: https://commitfest.postgresql.org/23/2098/ The current XLogRead() functions are not identical so the decryption specific code can't be simply copy & pasted. So far I'll turn this comment into a TODO comment and mention the refactoring patch there. > + { > + RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid}; > + RelFileNode toNode = {dsttablespace, dboid, InvalidOid}; > + > + copydir(srcpath, dstpath, &fromNode, &toNode); > + } > > I suggest instead declaring these variables in the surrounding scope > and initializing them here. This style is unlike most PostgreSQL > code. ok > + case WAIT_EVENT_KDF_FILE_READ: > + event_name = "KDFFileRead"; > + break; > > New wait events need documentation. ok > +are encrypted differently. Furthermore, as the Postgres page starts with LSN, > +even if only the last encryption block of the page changes, the whole cipher > +text of the page will be different after the next encryption. > > But for a hint-bit-only change, this would not be true, unless we > force wal_log_hints. Sounds like we're doing that, but it might be > worth mentioning it here also. ok > There's also the question of unlogged tables, which do not have an > LSN. This logic wouldn't apply there, which might be a problem. I missed that. The only idea I have now is that a "fake LSN" is set even for unlogged table if encryption is enabled. Actually that LSN would only be a counter that gets incremented each time the page contents has changed. > +If the encryption is enabled, the following requirements need to be taken into > +account: > + > +1. The file buffer cannot be positioned at arbitrary offset of the file. If > > I think it's no good at all to think about enforcing requirements like > this only when data_encryption is used. That's just going to lead to > bugs. I think we need to either find a way to mask this restriction > from the users of the buffile facility, or else revise all those users > so that they respect this new rule and enforce it with an > unconditional assert. Sorry, this is misunderstanding. The requirements mentioned in the "Temporary files" section of the README apply to BufFileWrite() and BufFileRead() internally. These functions take care of the correct buffer positioning. The caller does not have to care. I'll try to make this part clearer. > That brings me to another point, which is that a complex patch like > this really needs to be divided up into smaller patches for review. I > see you've started that a little bit (but see comments on commit > messages above) but you should try to go further with it. Maintenance of a long patch series requires additional effort and I was busy enough with major code rearrangements so far. I'll continue splitting the stuff into more diffs. > +To store other kinds of data encrypted than the ones above, developers are > +advised to use BufFileWriteTransient() and BufFileReadTransient() functions > > The previous paragraph is about BufFileWrite() and BufFileRead(), and > that section is titled "temporary files." Now here you say that for > other kinds of data (not relations, xlog, or temporary files) we > should use BufFileWriteTransient(). That seems pretty weird, because > data that it is not temporary files would presumably be, uh, permanent > files? So then why is Transient in the name? I imagine there are > good reasons here but it needs to be made clearer. "Transient" here means that the file descriptor is kept open for rather short time, typically until transaction boundary (unlike relation files). However the file remains on disk even if the descriptor is closed. I believe this is consistent with the use of OpenTransientFile(). The phrase "other kinds of data" is probably too generic and more explanation is needed. > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with > +the --link option.) > > That seems pretty unfortunate. Any way to do better? The reason is that the initialization vector (IV) for relation page encryption is derived from RelFileNode, and both dbNode and relNode can be different in the new cluster. Therefore the data of the old cluster need to be decrypted and encrypted using the new IV before it's copied to the new cluster. That's why we can't use symlink. As an alternative, I thought of deriving the IV by hashing the <database>.<schema>.<object name> string, but that would mean that the "reencryption" is triggered by commands like "ALTER TABLE ... RENAME TO ...", "ALTER TABLE ... SET SCHEMA ...", etc. Currently I'm thinking of making the IV less dependent on RelFileNode (e.g. generate a random number for which RelFileNode serves as the seed) and storing it in pg_class as a storage parameter. Therefore we wouldn't have to pay any extra attention to transfer of the IV to the new cluster. However, the IV storage parameter would need special treatment: 1. DBA should not be able to remove or change it using ALTER TABLE command because inability to decrypt the data can be the only outcome. 2. The \d+ command of psql client should not display the IV. Displaying it probably wouldn't be a security risk, but as we encrypt the whole instance, the IV would appear for every single table and that might be annoying. What do you think about this approach? -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Tue, May 28, 2019 at 11:27 AM Antonin Houska <ah@cybertec.at> wrote: > We thought that it's more convenient for administrator to enter password. To > achieve that, we can instead tell the admin how to use the key derivation tool > (pg_keysetup) and send its output to postgres. So yes, it's possible to always > receive the key from the "encryption key command". I'll change this. Sounds good. I'm not quite sure how this is going to work. Ideally you'd like the encryption key command to fetch the key from something like ssh-agent, or maybe pop up a window on the user's terminal with a key prompt. Just reading from stdin and writing to stdout is not going to be very convenient... > Maintenance of a long patch series requires additional effort and I was busy > enough with major code rearrangements so far. I'll continue splitting the > stuff into more diffs. Right, I understand. Whatever can be split off can be reviewed and committed separately, which will start to ease the burden of maintaining the patch set. I hope. But getting over the initial hump can be a lot of work. > > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with > > +the --link option.) > > > > That seems pretty unfortunate. Any way to do better? > > The reason is that the initialization vector (IV) for relation page encryption > is derived from RelFileNode, and both dbNode and relNode can be different in > the new cluster. Therefore the data of the old cluster need to be decrypted > and encrypted using the new IV before it's copied to the new cluster. That's > why we can't use symlink. > > As an alternative, I thought of deriving the IV by hashing the > <database>.<schema>.<object name> string, but that would mean that the > "reencryption" is triggered by commands like "ALTER TABLE ... RENAME TO ...", > "ALTER TABLE ... SET SCHEMA ...", etc. > > Currently I'm thinking of making the IV less dependent on RelFileNode > (e.g. generate a random number for which RelFileNode serves as the seed) and > storing it in pg_class as a storage parameter. Therefore we wouldn't have to > pay any extra attention to transfer of the IV to the new cluster. However, the > IV storage parameter would need special treatment: > > 1. DBA should not be able to remove or change it using ALTER TABLE command > because inability to decrypt the data can be the only outcome. > > 2. The \d+ command of psql client should not display the IV. Displaying it > probably wouldn't be a security risk, but as we encrypt the whole instance, > the IV would appear for every single table and that might be annoying. > > What do you think about this approach? I don't think it's a very good idea. For one thing, you can't store the IV for pg_class in pg_class; that has a circularity problem. For another thing, I'm not sure we can or should try to do catalog access from every place where an IV might be needed. In fact, I'd go so far as to say that sounds like a recipe for a lot of pain and fragility. One potential approach is to modify pg_upgrade to preserve the dbnode and relfilenode. I don't know of any fundamental reason why such a change shouldn't be possible, although it is probably a bunch of work, and there may be problems with which I am not acquainted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 28, 2019 at 11:27 AM Antonin Houska <ah@cybertec.at> wrote: > > We thought that it's more convenient for administrator to enter password. To > > achieve that, we can instead tell the admin how to use the key derivation tool > > (pg_keysetup) and send its output to postgres. So yes, it's possible to always > > receive the key from the "encryption key command". I'll change this. > > Sounds good. I'm not quite sure how this is going to work. Ideally > you'd like the encryption key command to fetch the key from something > like ssh-agent, or maybe pop up a window on the user's terminal with a > key prompt. Just reading from stdin and writing to stdout is not > going to be very convenient... When I mentioned writing to stdout in my previous email, I viewed it from the perspective of postmaster: whichever way the command gets the password from the DBA, it'll always write it to stdout and postmaster will read it from there. This was to address your objection that the executables do not actually "return" strings. > > > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with > > > +the --link option.) > > > > > > That seems pretty unfortunate. Any way to do better? > > > > > > Currently I'm thinking of making the IV less dependent on RelFileNode > > (e.g. generate a random number for which RelFileNode serves as the seed) and > > storing it in pg_class as a storage parameter. > > I don't think it's a very good idea. For one thing, you can't store > the IV for pg_class in pg_class; that has a circularity problem. For > another thing, I'm not sure we can or should try to do catalog access > from every place where an IV might be needed. In fact, I'd go so far > as to say that sounds like a recipe for a lot of pain and fragility. > > One potential approach is to modify pg_upgrade to preserve the dbnode > and relfilenode. I don't know of any fundamental reason why such a > change shouldn't be possible, although it is probably a bunch of work, > and there may be problems with which I am not acquainted. Actually it doesn't seem to be that tricky. In the --binary-upgrade mode, pg_dump does record pg_class.oid: -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('57465'::pg_catalog.oid); The header comment in pg_upgrade.c indicates that this is because of TOAST. So I think that dbnode and relfilenode are not preserved just because there was no strong reason to do so by now. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote: > > Sounds good. I'm not quite sure how this is going to work. Ideally > > you'd like the encryption key command to fetch the key from something > > like ssh-agent, or maybe pop up a window on the user's terminal with a > > key prompt. Just reading from stdin and writing to stdout is not > > going to be very convenient... > > When I mentioned writing to stdout in my previous email, I viewed it from the > perspective of postmaster: whichever way the command gets the password from > the DBA, it'll always write it to stdout and postmaster will read it from > there. This was to address your objection that the executables do not actually > "return" strings. So the part about returning strings is really just a wording issue: the documentation needs to talk about data sent to stdout or wherever, because that's what really happens, and somebody writing such a command from scratch needs to know what it must do. What I'm talking about here is that it also has to be reasonably possible to write an encryption key command that does something useful. I don't have a really clear vision for how that's going to work. Nobody wants the server, which is probably being launched by pg_ctl or systemd or both, to prompt using its own stdin/stderr, but the we need to think about how the prompting is actually going to work. One idea is to do it via the FEBE protocol: connect to the server using libpq, issue a new command that causes the server to enter COPY mode, and then send the encryption key as COPY data. However, that idea doesn't really work, because we've got to be able to get the key before we run recovery or reach consistency, so the server won't be listening for connections at that point. Also, we've got to have a way for this to work in single-user mode, which also can't listen for connections. It's probably all fine if your encryption key command is something like 'curl https://my.secret.keyserver.me/sekcret.key' because then there's an external server which you can just go access - but I don't quite understand how you'd do interactive prompting from here. Sorry to get hung up on what may seem like a minor point, but I think it's actually fairly fundamental: we've got to have a clear vision of how end-users will really be able to make use of this. > The header comment in pg_upgrade.c indicates that this is because of TOAST. So > I think that dbnode and relfilenode are not preserved just because there was > no strong reason to do so by now. Maybe you want to propose an independent patch making that change? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote: > > > Sounds good. I'm not quite sure how this is going to work. Ideally > > > you'd like the encryption key command to fetch the key from something > > > like ssh-agent, or maybe pop up a window on the user's terminal with a > > > key prompt. Just reading from stdin and writing to stdout is not > > > going to be very convenient... > > > > When I mentioned writing to stdout in my previous email, I viewed it from the > > perspective of postmaster: whichever way the command gets the password from > > the DBA, it'll always write it to stdout and postmaster will read it from > > there. This was to address your objection that the executables do not actually > > "return" strings. > > So the part about returning strings is really just a wording issue: > the documentation needs to talk about data sent to stdout or wherever, > because that's what really happens, and somebody writing such a > command from scratch needs to know what it must do. > > What I'm talking about here is that it also has to be reasonably > possible to write an encryption key command that does something > useful. I don't have a really clear vision for how that's going to > work. Nobody wants the server, which is probably being launched by > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but > the we need to think about how the prompting is actually going to > work. Since you mentioned ssh-agent above, I think that postmaster can, instead of running a command, create an unix socket and read the key from there. (I refer only to the socket as a communication channel, not to the protocol - ssh-agent does not seem to actually send key over the socket.) Unlike the socket for backend connections, this one would only be readable by the user that runs postmaster, and would only exist during the encryption initialization. The simplest approach from the perspective of the DBA is that pg_ctl can write the key to the socket. Besides that we can also implement a separate utility to send the key, to be used in other special cases such as starting postgres via systemd. (If the unix socket is a problem on windows, we might need to use named pipe instead.) > > The header comment in pg_upgrade.c indicates that this is because of TOAST. So > > I think that dbnode and relfilenode are not preserved just because there was > > no strong reason to do so by now. > > Maybe you want to propose an independent patch making that change? I think of a separate diff in the encryption patch series. As the encryption is the only reason for this enhancement, I'm not sure if it deserves a separate CF entry. (Although it might be considered refactoring because eventually pg_upgrade won't need to handle the different relnodes, and thus it might become a little bit simpler.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
Greetings, Antonin. You wrote 2019-06-05, 15:32: > Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, May 31, 2019 at 2:59 AM Antonin Houska <ah@cybertec.at> wrote: >> > > Sounds good. I'm not quite sure how this is going to work. Ideally >> > > you'd like the encryption key command to fetch the key from something >> > > like ssh-agent, or maybe pop up a window on the user's terminal with a >> > > key prompt. Just reading from stdin and writing to stdout is not >> > > going to be very convenient... >> > >> > When I mentioned writing to stdout in my previous email, I viewed it from the >> > perspective of postmaster: whichever way the command gets the password from >> > the DBA, it'll always write it to stdout and postmaster will read it from >> > there. This was to address your objection that the executables do not actually >> > "return" strings. >> >> So the part about returning strings is really just a wording issue: >> the documentation needs to talk about data sent to stdout or wherever, >> because that's what really happens, and somebody writing such a >> command from scratch needs to know what it must do. >> >> What I'm talking about here is that it also has to be reasonably >> possible to write an encryption key command that does something >> useful. I don't have a really clear vision for how that's going to >> work. Nobody wants the server, which is probably being launched by >> pg_ctl or systemd or both, to prompt using its own stdin/stderr, but >> the we need to think about how the prompting is actually going to >> work. > Since you mentioned ssh-agent above, I think that postmaster can, instead of > running a command, create an unix socket and read the key from there. (I refer > only to the socket as a communication channel, not to the protocol - ssh-agent > does not seem to actually send key over the socket.) Unlike the socket for > backend connections, this one would only be readable by the user that runs > postmaster, and would only exist during the encryption initialization. > The simplest approach from the perspective of the DBA is that pg_ctl can write > the key to the socket. Besides that we can also implement a separate utility > to send the key, to be used in other special cases such as starting postgres > via systemd. > (If the unix socket is a problem on windows, we might need to use named pipe > instead.) Yes, that definitely a problem on Windows. Pipes seems reasonable. Are mapped files are appropriate from the security point of view? >> > The header comment in pg_upgrade.c indicates that this is because of TOAST. So >> > I think that dbnode and relfilenode are not preserved just because there was >> > no strong reason to do so by now. >> >> Maybe you want to propose an independent patch making that change? > I think of a separate diff in the encryption patch series. As the encryption > is the only reason for this enhancement, I'm not sure if it deserves a > separate CF entry. (Although it might be considered refactoring because > eventually pg_upgrade won't need to handle the different relnodes, and thus it > might become a little bit simpler.) -- Kind regards, Pavlo mailto:pavlo.golub@cybertec.at
On Mon, Jun 3, 2019 at 12:04:54PM -0400, Robert Haas wrote: > > What I'm talking about here is that it also has to be reasonably > possible to write an encryption key command that does something > useful. I don't have a really clear vision for how that's going to > work. Nobody wants the server, which is probably being launched by > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but > the we need to think about how the prompting is actually going to > work. One idea is to do it via the FEBE protocol: connect to the > server using libpq, issue a new command that causes the server to > enter COPY mode, and then send the encryption key as COPY data. > However, that idea doesn't really work, because we've got to be able > to get the key before we run recovery or reach consistency, so the > server won't be listening for connections at that point. Also, we've > got to have a way for this to work in single-user mode, which also > can't listen for connections. It's probably all fine if your > encryption key command is something like 'curl > https://my.secret.keyserver.me/sekcret.key' because then there's an > external server which you can just go access - but I don't quite > understand how you'd do interactive prompting from here. Sorry to get > hung up on what may seem like a minor point, but I think it's actually > fairly fundamental: we've got to have a clear vision of how end-users > will really be able to make use of this. pgcryptoey has an example of prompting from /dev/tty: http://momjian.us/download/pgcryptokey/ Look at pgcryptokey_default.sample. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Jun 13, 2019 at 09:14:13PM -0400, Bruce Momjian wrote: > On Mon, Jun 3, 2019 at 12:04:54PM -0400, Robert Haas wrote: > > > > What I'm talking about here is that it also has to be reasonably > > possible to write an encryption key command that does something > > useful. I don't have a really clear vision for how that's going to > > work. Nobody wants the server, which is probably being launched by > > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but > > the we need to think about how the prompting is actually going to > > work. One idea is to do it via the FEBE protocol: connect to the > > server using libpq, issue a new command that causes the server to > > enter COPY mode, and then send the encryption key as COPY data. > > However, that idea doesn't really work, because we've got to be able > > to get the key before we run recovery or reach consistency, so the > > server won't be listening for connections at that point. Also, we've > > got to have a way for this to work in single-user mode, which also > > can't listen for connections. It's probably all fine if your > > encryption key command is something like 'curl > > https://my.secret.keyserver.me/sekcret.key' because then there's an > > external server which you can just go access - but I don't quite > > understand how you'd do interactive prompting from here. Sorry to get > > hung up on what may seem like a minor point, but I think it's actually > > fairly fundamental: we've got to have a clear vision of how end-users > > will really be able to make use of this. > > pgcryptoey has an example of prompting from /dev/tty: > > http://momjian.us/download/pgcryptokey/ > > Look at pgcryptokey_default.sample. Also, look at pgcryptokey_default.c and its use with shared_preload_libraries for a full example of prompting on server boot. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Jun 3, 2019 at 12:04:54PM -0400, Robert Haas wrote: > > > > What I'm talking about here is that it also has to be reasonably > > possible to write an encryption key command that does something > > useful. I don't have a really clear vision for how that's going to > > work. Nobody wants the server, which is probably being launched by > > pg_ctl or systemd or both, to prompt using its own stdin/stderr, but > > the we need to think about how the prompting is actually going to > > work. One idea is to do it via the FEBE protocol: connect to the > > server using libpq, issue a new command that causes the server to > > enter COPY mode, and then send the encryption key as COPY data. > > However, that idea doesn't really work, because we've got to be able > > to get the key before we run recovery or reach consistency, so the > > server won't be listening for connections at that point. Also, we've > > got to have a way for this to work in single-user mode, which also > > can't listen for connections. It's probably all fine if your > > encryption key command is something like 'curl > > https://my.secret.keyserver.me/sekcret.key' because then there's an > > external server which you can just go access - but I don't quite > > understand how you'd do interactive prompting from here. Sorry to get > > hung up on what may seem like a minor point, but I think it's actually > > fairly fundamental: we've got to have a clear vision of how end-users > > will really be able to make use of this. > > pgcryptoey has an example of prompting from /dev/tty: > > http://momjian.us/download/pgcryptokey/ > > Look at pgcryptokey_default.sample. It's a nice exercise of shell features but does not seem to be easily portable, and it has some other restrictions. One is mentioned in a comment: # Use 'pg_ctl -s' so starting dots do not interfere with password entry I think that besides the dots, log messages can also be disturbing if logging collector is not enabled. Another corner case might be there when user runs pg_ctl with --no-wait option and wants to do run some other command immediately. I'm thinking how to teach postmaster to accept FEBE protocol connections temporarily, just to receive the key. The user applications like pg_ctl, initdb or pg_upgrade would retrieve the key / password from the DBA, then start postmaster and send it the key. Perhaps the message format should be a bit generic so that extensions like this can use it to receive their keys too. (The idea of an unix socket or named pipe I proposed upthread is not good because it's harder to implement in a portable way.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
On 2019-06-17 11:23, Antonin Houska wrote: > I'm thinking how to teach postmaster to accept FEBE protocol connections > temporarily, just to receive the key. The user applications like pg_ctl, > initdb or pg_upgrade would retrieve the key / password from the DBA, then > start postmaster and send it the key. > > Perhaps the message format should be a bit generic so that extensions like > this can use it to receive their keys too. > > (The idea of an unix socket or named pipe I proposed upthread is not good > because it's harder to implement in a portable way.) How are the requirements here different from ssl_passphrase_command? Why do we need a new mechanism? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 25, 2019 at 8:28 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > How are the requirements here different from ssl_passphrase_command? > Why do we need a new mechanism? I don't think that the requirements are different, and I don't think we need a new mechanism. I am curious exactly how you would set up ssl_passphrase_command to prompt interactively. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-06-25 15:39, Robert Haas wrote: > On Tue, Jun 25, 2019 at 8:28 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> How are the requirements here different from ssl_passphrase_command? >> Why do we need a new mechanism? > > I don't think that the requirements are different, and I don't think > we need a new mechanism. > > I am curious exactly how you would set up ssl_passphrase_command to > prompt interactively. For example like this: https://www.2ndquadrant.com/en/blog/postgresql-passphrase-protected-ssl-keys-systemd/ -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Antonin Houska <ah@cybertec.at> wrote: > Thanks of another round of review. > > Robert Haas <robertmhaas@gmail.com> wrote: > > > On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote: > > > Attached is the next version. It tries to address various problems pointed out > > > upthread, including documentation. This is the next version. I've reworked the way key is passed to postmaster (v04-0003-...) so it's easier to call the encryption key command interactively, adjusted pg_upgrade so that it preserves database, tablespace and relfilenode OIDs (v04-0014-...), reorganized the code a bit and split the code into more diffs. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Attachment
- v04-0001-Make-postgres-aware-of-the-encryption.patch
- v04-0002-Prepare-ServerLoop-for-the-next-patch.patch
- v04-0003-Teach-postmaster-to-accept-encryption-key.patch
- v04-0004-Minor-refactoring-of-pg_ctl.c.patch
- v04-0005-Teach-pg_ctl-to-send-the-encryption-key-to-postmaste.patch
- v04-0006-Allow-user-to-use-password-instead-of-encryption-key.patch
- v04-0007-Refactoring-patch.-This-only-moves-some-code-to-XLog.patch
- v04-0008-Enable-XLOG-encryption.patch
- v04-0009-Enable-encryption-of-relations.patch
- v04-0010-Introduce-transient-buffered-file.patch
- v04-0011-Make-buffile.c-aware-of-encryption.patch
- v04-0012-Add-tests-for-buffile.c.patch
- v04-0013-Make-database-OID-constant-for-postgres-database.patch
- v04-0014-Teach-pg_upgrade-to-preserve-OIDs-of-relfilenode-dat.patch
- v04-0015-Make-pg_upgrade-aware-of-encryption.patch
- v04-0016-Teach-the-remaining-frontend-application-about-encry.patch
- v04-0017-User-and-development-documentation.patch
On Tue, Jun 25, 2019 at 02:28:00PM +0200, Peter Eisentraut wrote: > On 2019-06-17 11:23, Antonin Houska wrote: > > I'm thinking how to teach postmaster to accept FEBE protocol connections > > temporarily, just to receive the key. The user applications like pg_ctl, > > initdb or pg_upgrade would retrieve the key / password from the DBA, then > > start postmaster and send it the key. > > > > Perhaps the message format should be a bit generic so that extensions like > > this can use it to receive their keys too. > > > > (The idea of an unix socket or named pipe I proposed upthread is not good > > because it's harder to implement in a portable way.) > > How are the requirements here different from ssl_passphrase_command? > Why do we need a new mechanism? Agreed. My pgcryptokey prompting shell script was mostly a proof-of-concept. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Jul 6, 2019 at 2:42 AM Antonin Houska <ah@cybertec.at> wrote: > I've reworked the way key is passed to postmaster (v04-0003-...) so it's > easier to call the encryption key command interactively, adjusted pg_upgrade > so that it preserves database, tablespace and relfilenode OIDs (v04-0014-...), > reorganized the code a bit and split the code into more diffs. Hi Antonin, Some robotic feedback: 1. On Linux, an assertion failed: ExceptionalCondition (conditionName=conditionName@entry=0x973891 "!(decrypt_p == p)", errorType=errorType@entry=0x928d7d "FailedAssertion", fileName=fileName@entry=0x973827 "xlogutils.c", lineNumber=lineNumber@entry=815) at assert.c:54 See full stack here: https://travis-ci.org/postgresql-cfbot/postgresql/builds/555550833 2. On Windows the build failed: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46469 -- Thomas Munro https://enterprisedb.com
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: tested, failed Documentation: not tested Hi Antonin, It is very glad to see the new patch. I used the public patches a long time ago. I did some tests like the stream replication, much data running, temporary files encryption. I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING. There is some problem to merge your patches to the latest kernel in the pg_ctl.c. Regards, -- Shawn Wang
On 2019-Aug-02, Shawn Wang wrote: > Hi Antonin, > It is very glad to see the new patch. I used the public patches a long time ago. > I did some tests like the stream replication, much data running, temporary files encryption. > I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING. > There is some problem to merge your patches to the latest kernel in the pg_ctl.c. Is a new, fixed version going to be posted soon? It's been a while. Also, apologies if this has been asked before, but: how does this patch relate to the stuff being discussed in https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 3, 2019 at 10:21 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Aug-02, Shawn Wang wrote:
> Hi Antonin,
> It is very glad to see the new patch. I used the public patches a long time ago.
> I did some tests like the stream replication, much data running, temporary files encryption.
> I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef USE_ASSERT_CHECKING.
> There is some problem to merge your patches to the latest kernel in the pg_ctl.c.
Is a new, fixed version going to be posted soon? It's been a while.
Also, apologies if this has been asked before, but: how does this patch
relate to the stuff being discussed in
https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ?
Yes, it is related to that, but we don't have that on our agenda in a weekly meeting. It has
many conflicting points about what we are doing. Swada / Bruce can comment on that.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Ibrar Ahmed
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Aug-02, Shawn Wang wrote: > > > Hi Antonin, > > It is very glad to see the new patch. I used the public patches a long time ago. > > I did some tests like the stream replication, much data running, temporary files encryption. > > I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx);under the #ifdef USE_ASSERT_CHECKING. > > There is some problem to merge your patches to the latest kernel in the pg_ctl.c. > > Is a new, fixed version going to be posted soon? It's been a while. > > Also, apologies if this has been asked before, but: how does this patch > relate to the stuff being discussed in > https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ? This thread started later than our effort but important design questions are being discussed there. So far there seems to be no consensus whether full-instance encryption should be implemented first, so any effort spent on this patch might get wasted. When/if there will be an agreement on the design, we'll see how much of this patch can be used. -- Antonin Houska Web: https://www.cybertec-postgresql.com
---- On Wed, 04 Sep 2019 00:56:15 +0800 Alvaro Herrera <alvherre@2ndquadrant.com> wrote ----
On 2019-Aug-02, Shawn Wang wrote:> Hi Antonin,> It is very glad to see the new patch. I used the public patches a long time ago.> I did some tests like the stream replication, much data running, temporary files encryption.> I found that there is an issue in the src/backend/storage/file/encryption.c. You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef USE_ASSERT_CHECKING.> There is some problem to merge your patches to the latest kernel in the pg_ctl.c.Is a new, fixed version going to be posted soon? It's been a while.
Also, apologies if this has been asked before, but: how does this patchrelate to the stuff being discussed in
Hi Álvaro,
Thank you for a reply.
I mainly said that the issues in the src/backend/storage/file/encryption.c. If somebody want to use these patches, I think Antonin need to fix it.
It does not relate to the stuff being discussed in TDE. As I know, some company use these patches to encrypt data, even if these issues don't matter.
Regards,
--
Shawn Wang
On Wed, Sep 04, 2019 at 06:56:18AM +0200, Antonin Houska wrote: > This thread started later than our effort but important design questions are > being discussed there. So far there seems to be no consensus whether > full-instance encryption should be implemented first, so any effort spent on > this patch might get wasted. When/if there will be an agreement on the design, > we'll see how much of this patch can be used. I see. I have marked the patch as returned with feedback in CF 2019-11. Let's see how the other one finishes, before perhaps coming back to this one. -- Michael