Re: RFC: PostgreSQL Storage I/O Transformation Hooks - Mailing list pgsql-hackers
| From | Henson Choi |
|---|---|
| Subject | Re: RFC: PostgreSQL Storage I/O Transformation Hooks |
| Date | |
| Msg-id | CAAAe_zBUGMfOCo8JJTJiT+9Uo9TWnd_36e59iuntPObByqmDHw@mail.gmail.com Whole thread Raw |
| In response to | Re: RFC: PostgreSQL Storage I/O Transformation Hooks (Tomas Vondra <tomas@vondra.me>) |
| List | pgsql-hackers |
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Tomas,
Thank you for this critical feedback. Your concerns go to the heart of the proposal's viability, and I appreciate your directness.
1. Multiple Extensions and Hook Chaining
You're right to question this. To be honest, I have significant doubts about allowing multiple transformation extensions simultaneously.
The Transform ID coordination problem is real: without a registry or protocol between extensions, they cannot cooperate safely. Hook chaining for read/write operations might work (extension A encrypts, extension B compresses), but the Transform ID field creates conflicts.
Perhaps I should be more direct: transformation hook chaining is not realistically possible with the current design. TDE extensions would need exclusive use of these hooks. This is a fundamental limitation I should have stated clearly in the RFC.
2. pd_flags Reservation - I Hope You'll Consider This
I understand your concern about reserving pd_flags bits for extensions. However, I'd like to ask you to consider the reasoning behind this choice.
The 5-bit Transform ID serves a critical purpose: it allows the core to identify the page's transformation state without attempting decryption. This is important for:
- Error reporting: "This page is encrypted with transform ID 5, but no extension is loaded to handle it"
- Migration safety: Distinguishing between untransformed pages (ID=0) and transformed pages during gradual encryption
- Crash recovery: The core can detect transformation state inconsistencies
That said, I recognize pd_flags is precious and limited. Let me propose an alternative approach that might better align with core principles:
Instead of extension-specific Transform IDs, what if we allow extensions to reserve space at pd_upper (similar to how special space works at pd_special)?
The core could manage a small flag (2-3 bits) indicating "N bytes at pd_upper are reserved for transformation metadata". By encoding N as multiples of 2 or 4 bytes, we maximize the flag's efficiency:
- 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
- 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
- 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
This approach uses minimal pd_flags bits while providing substantial metadata space. It would:
- Keep the flag in core control (not extension-specific)
- Allow extensions to store IV, authentication tags, key version, etc. in a standardized location
- Be self-describing (the flag tells you how much space is reserved)
- Generalize beyond encryption (compression, checksums, etc. could use it)
In our internal implementation, we actually add opaque bytes to PageHeader for encryption metadata. This pd_upper approach could formalize that pattern for extensions.
I believe some form of page-level metadata for transformations is necessary. Would either approach (Transform ID or pd_upper reservation) be acceptable with the right design, or do you see fundamental issues with page-level transformation metadata itself?
3. Maintenance Burden and Test Coverage
I deeply appreciate this concern. Having worked across various DBMS implementations, I've seen solution vendors ship without comprehensive regression testing - but never a database vendor. DBMS maintenance is extraordinarily difficult, and storage errors are catastrophic.
This is precisely why test_tde exists as a reference implementation. But you've identified the real issue: we need much stronger test coverage for the hooks themselves.
The test cases should:
- Detect when core changes break hook contracts
- Verify hook behavior under all I/O paths (sync, async, error cases)
- Validate critical section safety
- Test interaction with checksums, crash recovery, replication
I agree the current test coverage is insufficient for core inclusion. Would expanding the test suite to cover these scenarios address your maintenance concerns, or do you see fundamental fragility beyond what testing can solve?
4. Hooks vs Transform Layer - Pragmatic Timeline
You suggested improving SMGR extensibility rather than adding hooks. I think you're architecturally right about the long-term direction.
However, I want to be pragmatic about timelines:
The hook and pd_flags approach, despite its limitations, can deliver working TDE in the shortest time. Organizations facing regulatory deadlines need something that works now, not in 2-3 years.
That said, your feedback has sparked a better idea: what if we think of this not as "SMGR extension" or "hooks" but as a pluggable Transform Layer that SMGR and WAL subsystems delegate to?
Conceptually:
Application Layer
|
Buffer Manager
|
+------------------+
| Transform Layer | <-- Encryption, etc.
+------------------+
|
SMGR / WAL
|
File I/O
This is architecturally cleaner than scattered hooks, and more focused than full SMGR extensibility. The Transform Layer would:
- Provide a unified interface for data transformation
- Work across backend, frontend tools, and replication
- Handle metadata management in a standardized way
- Support encryption, compression, or other transformations
I think this deserves its own discussion thread rather than conflating it with the current hook proposal. Would you be interested in starting a separate conversation about designing a Transform Layer interface for PostgreSQL?
In the meantime, the hook approach could serve organizations with immediate needs, and extensions could migrate to the Transform Layer once it's stabilized.
5. Frontend Tool Access
Both SMGR and hook approaches face a shared limitation: frontend tools (pg_checksums, pg_basebackup, etc.) that read files directly.
I previously suggested allowing initdb to specify a shared library that both backend and frontend can load for transformation. But as I reconsider this, it feels like it converges toward the Transform Layer idea: a well-defined interface that any PostgreSQL component can use.
This might be the real architectural question: not "hooks vs SMGR" but "how should PostgreSQL provide transformation points that work across backend, frontend, and replication boundaries?"
Summary
Your feedback has clarified three important points:
1. The current hook design has real limitations (multiple extension conflicts, pd_flags concerns)
2. Test coverage needs to be much more comprehensive
3. A cleaner abstraction might be needed long-term
I propose a dual approach:
Short-term: Move forward with the hook proposal for organizations with immediate regulatory needs. I commit to:
- Stating clearly that hook chaining is not supported
- Significantly expanding test coverage
- Treating this as a pragmatic solution with known limitations
Long-term: I'd like to start a separate discussion about a Transform Layer abstraction - a unified interface that could handle data transformation across backend, frontend tools, and replication. This would be architecturally cleaner than scattered hooks, and could eventually supersede this approach.
Would you be willing to review a Transform Layer proposal in a separate thread? I think it addresses the architectural concerns you've raised, while the hook approach serves immediate practical needs.
Best regards,
Henson
Hi Tomas,
Thank you for this critical feedback. Your concerns go to the heart of the proposal's viability, and I appreciate your directness.
1. Multiple Extensions and Hook Chaining
You're right to question this. To be honest, I have significant doubts about allowing multiple transformation extensions simultaneously.
The Transform ID coordination problem is real: without a registry or protocol between extensions, they cannot cooperate safely. Hook chaining for read/write operations might work (extension A encrypts, extension B compresses), but the Transform ID field creates conflicts.
Perhaps I should be more direct: transformation hook chaining is not realistically possible with the current design. TDE extensions would need exclusive use of these hooks. This is a fundamental limitation I should have stated clearly in the RFC.
2. pd_flags Reservation - I Hope You'll Consider This
I understand your concern about reserving pd_flags bits for extensions. However, I'd like to ask you to consider the reasoning behind this choice.
The 5-bit Transform ID serves a critical purpose: it allows the core to identify the page's transformation state without attempting decryption. This is important for:
- Error reporting: "This page is encrypted with transform ID 5, but no extension is loaded to handle it"
- Migration safety: Distinguishing between untransformed pages (ID=0) and transformed pages during gradual encryption
- Crash recovery: The core can detect transformation state inconsistencies
That said, I recognize pd_flags is precious and limited. Let me propose an alternative approach that might better align with core principles:
Instead of extension-specific Transform IDs, what if we allow extensions to reserve space at pd_upper (similar to how special space works at pd_special)?
The core could manage a small flag (2-3 bits) indicating "N bytes at pd_upper are reserved for transformation metadata". By encoding N as multiples of 2 or 4 bytes, we maximize the flag's efficiency:
- 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
- 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
- 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
This approach uses minimal pd_flags bits while providing substantial metadata space. It would:
- Keep the flag in core control (not extension-specific)
- Allow extensions to store IV, authentication tags, key version, etc. in a standardized location
- Be self-describing (the flag tells you how much space is reserved)
- Generalize beyond encryption (compression, checksums, etc. could use it)
In our internal implementation, we actually add opaque bytes to PageHeader for encryption metadata. This pd_upper approach could formalize that pattern for extensions.
I believe some form of page-level metadata for transformations is necessary. Would either approach (Transform ID or pd_upper reservation) be acceptable with the right design, or do you see fundamental issues with page-level transformation metadata itself?
3. Maintenance Burden and Test Coverage
I deeply appreciate this concern. Having worked across various DBMS implementations, I've seen solution vendors ship without comprehensive regression testing - but never a database vendor. DBMS maintenance is extraordinarily difficult, and storage errors are catastrophic.
This is precisely why test_tde exists as a reference implementation. But you've identified the real issue: we need much stronger test coverage for the hooks themselves.
The test cases should:
- Detect when core changes break hook contracts
- Verify hook behavior under all I/O paths (sync, async, error cases)
- Validate critical section safety
- Test interaction with checksums, crash recovery, replication
I agree the current test coverage is insufficient for core inclusion. Would expanding the test suite to cover these scenarios address your maintenance concerns, or do you see fundamental fragility beyond what testing can solve?
4. Hooks vs Transform Layer - Pragmatic Timeline
You suggested improving SMGR extensibility rather than adding hooks. I think you're architecturally right about the long-term direction.
However, I want to be pragmatic about timelines:
The hook and pd_flags approach, despite its limitations, can deliver working TDE in the shortest time. Organizations facing regulatory deadlines need something that works now, not in 2-3 years.
That said, your feedback has sparked a better idea: what if we think of this not as "SMGR extension" or "hooks" but as a pluggable Transform Layer that SMGR and WAL subsystems delegate to?
Conceptually:
Application Layer
|
Buffer Manager
|
+------------------+
| Transform Layer | <-- Encryption, etc.
+------------------+
|
SMGR / WAL
|
File I/O
This is architecturally cleaner than scattered hooks, and more focused than full SMGR extensibility. The Transform Layer would:
- Provide a unified interface for data transformation
- Work across backend, frontend tools, and replication
- Handle metadata management in a standardized way
- Support encryption, compression, or other transformations
I think this deserves its own discussion thread rather than conflating it with the current hook proposal. Would you be interested in starting a separate conversation about designing a Transform Layer interface for PostgreSQL?
In the meantime, the hook approach could serve organizations with immediate needs, and extensions could migrate to the Transform Layer once it's stabilized.
5. Frontend Tool Access
Both SMGR and hook approaches face a shared limitation: frontend tools (pg_checksums, pg_basebackup, etc.) that read files directly.
I previously suggested allowing initdb to specify a shared library that both backend and frontend can load for transformation. But as I reconsider this, it feels like it converges toward the Transform Layer idea: a well-defined interface that any PostgreSQL component can use.
This might be the real architectural question: not "hooks vs SMGR" but "how should PostgreSQL provide transformation points that work across backend, frontend, and replication boundaries?"
Summary
Your feedback has clarified three important points:
1. The current hook design has real limitations (multiple extension conflicts, pd_flags concerns)
2. Test coverage needs to be much more comprehensive
3. A cleaner abstraction might be needed long-term
I propose a dual approach:
Short-term: Move forward with the hook proposal for organizations with immediate regulatory needs. I commit to:
- Stating clearly that hook chaining is not supported
- Significantly expanding test coverage
- Treating this as a pragmatic solution with known limitations
Long-term: I'd like to start a separate discussion about a Transform Layer abstraction - a unified interface that could handle data transformation across backend, frontend tools, and replication. This would be architecturally cleaner than scattered hooks, and could eventually supersede this approach.
Would you be willing to review a Transform Layer proposal in a separate thread? I think it addresses the architectural concerns you've raised, while the hook approach serves immediate practical needs.
Best regards,
Henson
2025년 12월 29일 (월) AM 4:24, Tomas Vondra <tomas@vondra.me>님이 작성:
On 12/28/25 08:49, Henson Choi wrote:
>
> 3. Proposal Specifications
>
>
> 3.1 The Interface (Hook Points)
>
> We allow intervention by security experts through five contact points
> along the I/O path:
>
> * *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
> (Transformation of the data area)
> * *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of
> transaction logs)
>
>
> 3.2 The Protocol Identifier (PageHeader Transformation ID)
>
> We allocate 5 bits of |pd_flags| to define the “Security State” of a
> page. This serves as a *Status Message* sent by the security expert to
> the engine, utilized for key versioning and as a migration marker.
>
Isn't this rather problematic?
This seems to be meant to be extensible, which means there can be
multiple extensions setting the hooks. Which we generally allow, and the
custom is to call the previous hook.
What happens if there are multiple extensions implementing the hook?
Would that be allowed or prohibited in this case? Maybe it doesn't make
sense, but then why wouldn't it be possible?
FWIW I find it very unlikely we'd allow reserving pd_flags bits for an
extension. These bits are meant to be used by core, there's very limited
number of such bits.
In general, I'm somewhat skeptical of the claim a collection of hooks is
"low-barrier, high-safety". It seems pretty fragile to me, and I can
envision a lot of maintenance difficulties in the future. Not just for
the extension developers, but for the project too - adding a bunch of
random hooks is not free for us, we'll need to keep it working in future
releases, etc.
Perhaps the current SMGR code is not extensible/flexible enough, but
then we need to improve that. I'd imagine a simple SMGR doing the
encryption, but federating most of the work to a "full" SMGR. But I
haven't thought about that too much.
regards
--
Tomas Vondra
pgsql-hackers by date: