Re: Moving forward with TDE [PATCH v3] - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Moving forward with TDE [PATCH v3] |
Date | |
Msg-id | CAEze2WgwzUUM_Af+KmS=0sJmCLGz-m0Nr1NBP8oLnFm2mdHV0w@mail.gmail.com Whole thread Raw |
In response to | Re: Moving forward with TDE [PATCH v3] (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Moving forward with TDE [PATCH v3]
|
List | pgsql-hackers |
On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote: > > I'm quite surprised at the significant number of changes being made > > outside the core storage manager files. I thought that changing out > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired) > > would be the most obvious change to implement cluster-wide encryption > > with the least code touched, as relations don't need to know whether > > the files they're writing are encrypted, right? Is there a reason to > > not implement this at the smgr level that I overlooked in the > > documentation of these patches? > > You can't really implement encryption transparently inside an smgr without > significant downsides. You need a way to store an initialization vector > associated with the page (or you can store that elsewhere, but then you've > doubled the worst cse amount of random reads/writes). The patch uses the LSN > as the IV (which I doubt is a good idea). For authenticated encryption further > additional storage space is required. I am unaware of any user of the smgr API that doesn't also use the buffer cache, and thus implicitly the Page layout with PageHeader [^1]. The API of smgr is also tailored to page-sized quanta of data with mostly relation-level information. I don't see why there would be a veil covering the layout of Page for smgr when all other information already points to the use of PageHeader and Page layouts. In my view, it would even make sense to allow the smgr to get exclusive access to some part of the page in the current Page layout. Yes, I agree that there will be an impact on usable page size if you want authenticated encryption, and that AMs will indeed need to account for storage space now being used by the smgr - inconvenient, but it serves a purpose. That would happen regardless of whether smgr or some higher system decides where to store the data for encryption - as long as it is on the page, the AM effectively can't use those bytes. But I'd say that's best solved by making the Page documentation and PageInit API explicit about the potential use of that space by the chosen storage method (encrypted, plain, ...) instead of requiring the various AMs to manually consider encryption when using Postgres' APIs for writing data to disk without hitting shared buffers; page space management is already a task of AMs, but handling the actual encryption is not. Should the AM really care whether the data on disk is encrypted or not? I don't think so. When the disk contains encrypted bytes, but smgrread() and smgrwrite() both produce and accept plaintext data, who's going to complain? Requiring AMs to be mindful about encryption on all common paths only adds pitfalls where encryption would be forgotten by the developer of AMs in one path or another. > To be able to to use the LSN as the IV, the patch needs to ensure that the LSN > increases in additional situations (a more aggressive version of > wal_log_hint_bits) - which can't be done below smgr, where we don't know about > what WAL logging was done. Nor can you easily just add space on the page below > md.c, for the purpose of storing an LSN independent IV and the authentication > data. I think that getting PageInit to allocate the smgr-specific area would take some effort, too (which would potentially require adding some relational context to PageInit, so that it knows which page of which relation it is going to initialize), but IMHO that would be more natural than requiring all index and table AMs to be aware the actual encryption of its pages and require manual handling of that encryption when the page needs to be written to disk, when it otherwise already conforms to the various buffer management and file extension APIs currently in use in PostgreSQL. I would expect "transparent" data encryption to be handled at the file write layer (i.e. smgr), not inside the AMs. Kind regards, Matthias van de Meent [^1] ReadBuffer_common uses PageIsVerifiedExtended which verifies that a page conforms with Postgres' Page layout if checksums are enabled. Furthermore, all builtin index AMs utilize pd_special, further implying the use of a PageInit/PageHeader-based page layout. Additionally, the heap tableAM also complies, and both FSM and VM also use postgres' Page layout. As for other AMs that I could check: bloom, rum, and pgvector's ivfflat and hnsw all use page layouts.
pgsql-hackers by date: