Re: RFC: PostgreSQL Storage I/O Transformation Hooks - Mailing list pgsql-hackers
| From | Tomas Vondra |
|---|---|
| Subject | Re: RFC: PostgreSQL Storage I/O Transformation Hooks |
| Date | |
| Msg-id | e3214639-36b8-42ec-ac69-cb4379962fbc@vondra.me Whole thread Raw |
| In response to | Re: RFC: PostgreSQL Storage I/O Transformation Hooks (Henson Choi <assam258@gmail.com>) |
| Responses |
Re: RFC: PostgreSQL Storage I/O Transformation Hooks
|
| List | pgsql-hackers |
Please don't top-post. We generally prefer to reply in-line, which makes it easier to follow the discussion. With top-posting I have to seek what are you responding to. On 12/29/25 03:35, Henson Choi wrote: > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks > > Hi Tomas, > > Thank you for this critical feedback. Your concerns go to the heart of > the proposal's viability, and I appreciate your directness. > > > 1. Multiple Extensions and Hook Chaining > > You're right to question this. To be honest, I have significant doubts > about allowing multiple transformation extensions simultaneously. > > The Transform ID coordination problem is real: without a registry or > protocol between extensions, they cannot cooperate safely. Hook chaining > for read/write operations might work (extension A encrypts, extension B > compresses), but the Transform ID field creates conflicts. > > Perhaps I should be more direct: transformation hook chaining is not > realistically possible with the current design. TDE extensions would > need exclusive use of these hooks. This is a fundamental limitation I > should have stated clearly in the RFC. > Isn't that just another argument against using hooks? Chaining is what hooks do, and there's no protection against a hook being set by multiple extensions. > > 2. pd_flags Reservation - I Hope You'll Consider This > > I understand your concern about reserving pd_flags bits for extensions. > However, I'd like to ask you to consider the reasoning behind this choice. > > The 5-bit Transform ID serves a critical purpose: it allows the core to > identify the page's transformation state without attempting decryption. > This is important for: > > - Error reporting: "This page is encrypted with transform ID 5, but no > extension is loaded to handle it" > - Migration safety: Distinguishing between untransformed pages (ID=0) > and transformed pages during gradual encryption > - Crash recovery: The core can detect transformation state inconsistencies > > That said, I recognize pd_flags is precious and limited. Let me propose > an alternative approach that might better align with core principles: > The information may be crucial, but pd_flags is simply not meant to be used by extensions to store custom data. > Instead of extension-specific Transform IDs, what if we allow extensions > to reserve space at pd_upper (similar to how special space works at > pd_special)? > > The core could manage a small flag (2-3 bits) indicating "N bytes at > pd_upper are reserved for transformation metadata". By encoding N as > multiples of 2 or 4 bytes, we maximize the flag's efficiency: > > - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases) > - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs) > - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity) > > This approach uses minimal pd_flags bits while providing substantial > metadata space. It would: > > - Keep the flag in core control (not extension-specific) > - Allow extensions to store IV, authentication tags, key version, etc. > in a standardized location > - Be self-describing (the flag tells you how much space is reserved) > - Generalize beyond encryption (compression, checksums, etc. could use it) > > In our internal implementation, we actually add opaque bytes to > PageHeader for encryption metadata. This pd_upper approach could > formalize that pattern for extensions. > > I believe some form of page-level metadata for transformations is > necessary. Would either approach (Transform ID or pd_upper reservation) > be acceptable with the right design, or do you see fundamental issues > with page-level transformation metadata itself? > AFAICS this is pretty much exactly what this patch aimed to do (also to allow implementing TDE): https://commitfest.postgresql.org/patch/3986/ Clearly, it's not as simple as it may seem, otherwise the patch would not be WIP for 3 years. > > 3. Maintenance Burden and Test Coverage > > I deeply appreciate this concern. Having worked across various DBMS > implementations, I've seen solution vendors ship without comprehensive > regression testing - but never a database vendor. DBMS maintenance is > extraordinarily difficult, and storage errors are catastrophic. > > This is precisely why test_tde exists as a reference implementation. But > you've identified the real issue: we need much stronger test coverage > for the hooks themselves. > > The test cases should: > - Detect when core changes break hook contracts > - Verify hook behavior under all I/O paths (sync, async, error cases) > - Validate critical section safety > - Test interaction with checksums, crash recovery, replication > > I agree the current test coverage is insufficient for core inclusion. > Would expanding the test suite to cover these scenarios address your > maintenance concerns, or do you see fundamental fragility beyond what > testing can solve? > I wasn't talking about test coverage. My point is we'd have to keep this working forever, even if we choose to change how the SMGR works. Which is not entirely theoretical. > > 4. Hooks vs Transform Layer - Pragmatic Timeline > > You suggested improving SMGR extensibility rather than adding hooks. I > think you're architecturally right about the long-term direction. > > However, I want to be pragmatic about timelines: > > The hook and pd_flags approach, despite its limitations, can deliver > working TDE in the shortest time. Organizations facing regulatory > deadlines need something that works now, not in 2-3 years. > Others may see it differently, but my opinion is using pd_flags is a dead end. I realize users may wish for a solution "soon", but we're not going to accept a flawed approach because of that. Exchanging short-term benefit for long-term pain does not seem like a good trade off. > That said, your feedback has sparked a better idea: what if we think of > this not as "SMGR extension" or "hooks" but as a pluggable Transform > Layer that SMGR and WAL subsystems delegate to? > > Conceptually: > > Application Layer > | > Buffer Manager > | > +------------------+ > | Transform Layer | <-- Encryption, etc. > +------------------+ > | > SMGR / WAL > | > File I/O > > This is architecturally cleaner than scattered hooks, and more focused > than full SMGR extensibility. The Transform Layer would: > > - Provide a unified interface for data transformation > - Work across backend, frontend tools, and replication > - Handle metadata management in a standardized way > - Support encryption, compression, or other transformations > > I think this deserves its own discussion thread rather than conflating > it with the current hook proposal. Would you be interested in starting a > separate conversation about designing a Transform Layer interface for > PostgreSQL? > Maybe. But I'm not convinced it'd be great to have many parallel thread discussing approaches for the same ultimate end goal. > In the meantime, the hook approach could serve organizations with > immediate needs, and extensions could migrate to the Transform Layer > once it's stabilized. > It's not like there are no alternatives, though. We have FDE/LUKS, application-level encryption, etc. Now there's also pg_tde. FWIW the hypothetical migration would be far from trivial. > > 5. Frontend Tool Access > > Both SMGR and hook approaches face a shared limitation: frontend tools > (pg_checksums, pg_basebackup, etc.) that read files directly. > I'm not a TDE expert, but I don't see why would tools like pg_basebackup need to be aware of this at all. A basebackup is just a filesystem copy. > I previously suggested allowing initdb to specify a shared library that > both backend and frontend can load for transformation. But as I > reconsider this, it feels like it converges toward the Transform Layer > idea: a well-defined interface that any PostgreSQL component can use. > > This might be the real architectural question: not "hooks vs SMGR" but > "how should PostgreSQL provide transformation points that work across > backend, frontend, and replication boundaries?" > Maybe. I was not proposing a new "transformation" layer, though. My suggestion was entirely within the current SMGR architecture. regards -- Tomas Vondra
pgsql-hackers by date: