Re: Transparent Data Encryption (TDE) and encrypted files - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Transparent Data Encryption (TDE) and encrypted files |
Date | |
Msg-id | CA+TgmoYLPMiXR0hvLtvm1WxNyLCQCTcQFzzWa9bdQLyz3Mq2Gg@mail.gmail.com Whole thread Raw |
In response to | Re: Transparent Data Encryption (TDE) and encrypted files (Antonin Houska <ah@cybertec.at>) |
List | pgsql-hackers |
On Tue, Oct 8, 2019 at 7:52 AM Antonin Houska <ah@cybertec.at> wrote: > * Temporary files (buffile.c): we derive the IV from PID of the process that > created the file + segment number + block within the segment. This > information does not change if you need to write the same block again. If > new IV should be used for each encryption run, we can simply introduce an > in-memory counter that generates the IV for each block. However it becomes > trickier if the temporary file is shared by multiple backends. I think it > might still be easier to expose the IV values to other backends via shared > memory than to store them on disk ... > > * "Buffered transient file". This is to be used instead of OpenTransientFile() > if user needs the option to encrypt the file. (Our patch adds this API to > buffile.c. Currently we use it in reorderbuffer.c to encrypt the data > changes produced by logical decoding, but there should be more use cases.) > > In this case we cannot keep the IVs in memory because user can close the > file anytime and open it much later. So we derive the IV by hashing the file > path. However if we should generate the IV again and again, we need to store > it on disk in another way, probably one IV value per block (PGAlignedBlock). > > However since our implementation of both these file types shares some code, > it might yet be easier if the shared temporary file also stored the IV on > disk instead of exposing it via shared memory ... > > Perhaps this is what I can work on, but I definitely need some feedback. I think this would be a valuable thing upon which to work. I'm not sure exactly what the right solution is, but it seems to me that it would be a good thing if we tried to reuse the same solution in as many places as possible. I don't know if it's realistic to use the same method for storing IVs for temporary/transient files as we do for SLRUs, but it would be nice if it were. I think that one problem with trying to store the data in memory is that these files get big enough that N bytes/block could still be pretty big. For instance, if you're sorting 100GB of data with 8GB of work_mem, you'll need to write 13 tapes and then merge them. Supposing an IV of 12 bytes/block, the IV vector for each 8GB tape will be 12MB, so once you've written all 12 types and are ready to merge them, you're going to have 156MB of IV data floating around. If you keep it in memory, it ought to count against your work_mem budget, and while it's not a big fraction of your available memory, it's also not negligible. Worse (but less realistic) cases can also be constructed. To avoid this kind of problem, you could write the IV data to disk. But notice that tuplesort.c goes to a lot of work to make I/O sequential, and that helps performance. If you have to intersperse reads of separate IV files with the reads of the main data files, you're going to degrade the I/O pattern. It would really be best if the IVs were in line with the data itself, I think. (The same probably applies, and for not unrelated reasons, to SLRU data, if we're going to try to encrypt that.) Now, if you could store some kind of an IV "seed" where we only need one per buffile rather than one per block, then that'd probably be fine to story in memory. But I don't see how that would work given that we can overwrite already-written blocks and need a new IV if we do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: