Re: RFC: PostgreSQL Storage I/O Transformation Hooks - Mailing list pgsql-hackers

From Henson Choi
Subject Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Date
Msg-id CAAAe_zAosQq6e6DeReQjZcd7C85BPRCCYOBdyh41FgMrOxuVsg@mail.gmail.com
Whole thread Raw
In response to Re: RFC: PostgreSQL Storage I/O Transformation Hooks  (Zsolt Parragi <zsolt.parragi@percona.com>)
Responses Re: RFC: PostgreSQL Storage I/O Transformation Hooks
List pgsql-hackers
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Hi Zsolt,

Thank you for your detailed questions. I'll address each point:

1. Bundling WAL and Buffer Manager

WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.

2. Scope: Temporary Files, System Tables, and Frontend Tools

I intentionally kept the scope focused. Past TDE proposals often stalled
because they tried to solve everything at once, becoming too large to
review. I prefer a "divide-and-conquer" approach:

- Temporary files: Out of scope for this initial infrastructure proposal.
- System tables: While they cannot be encrypted during bootstrap (since
  extensions aren't loaded), they can be transformed page-by-page during
  normal operation.
- Frontend tools (pg_waldump, etc.): I am aware of this and have modified
  versions. Currently, there is no standard mechanism for frontend hooks,
  making this a broader challenge. For production, extensions could ship
  their own modified frontend tools temporarily. Long-term, we may need
  initdb-time configurations to unify backend/frontend hook behavior
  that are fixed for the lifetime of the cluster.

3. Why Hooks Instead of SMGR

Please see my response to Konstantin in this thread regarding maintenance
debt and the "Separation of Concerns" between storage management and data
transformation.

4. Page Header Flags vs. Fork Files

My primary concern with using fork files for encryption metadata is crash
recovery. If a fork file and the actual data page become inconsistent
(e.g., during a crash), recovery becomes problematic because fork files
are not typically protected by WAL.

Storing the Transform ID in the header flags ensures that the metadata
travels with the page. This is essential for incremental key rotation,
where pages are gradually re-encrypted with newer keys over time. The
oldest key's pages are force-rotated, allowing continuous key rotation
without service interruption. I plan to propose a separate RFC for this
"gradual rotation" mechanism.

5. Benchmarks and Critical Section Overhead

Transformation happens inside the critical section but before acquiring
the WAL lock. On consumer-grade SSDs, the encryption latency is largely
masked by I/O wait times with negligible performance impact. On
high-performance storage (production SSDs, Apple Silicon, etc.), the
reduced I/O wait exposes the encryption overhead, which is visible but
modest. Detailed benchmarks require company approval - I will follow up
later.

Best regards,
Henson Choi

2025년 12월 28일 (일) PM 10:12, Zsolt Parragi <zsolt.parragi@percona.com>님이 작성:
Hello!

I am glad to see that there are multiple TDE extension proposals being
worked on. For context, I am one of the developers working on the
pg_tde[1] extension, as well as on the extensible SMGR proposal that
Konstantin already linked.

This patch/proposal contains two distinct parts of
encryption/extensibility, WAL and buffer manager/table data. Based on
earlier discussions, the opinions of adding extension points to these
two are quite different, and because of that I'm not sure if bundling
them together is helpful.

It also appears to be missing some extension points that would be
required for a more complete encryption solution, such as encrypting
temporary files or system tables, or handling command-line utilities
like pg_waldump. Do you have ideas or patches in mind for those areas
as well?

I have the same question as Konstantin, why did you choose custom
hooks for the buffer manager instead of the already existing smgr
interface / extensibility patch? While that patch is not part of the
core (but I hope it will be), it is already used by multiple companies
as  it supports other use cases, not only encryption. We plan to focus
more on that thread early next year, we would appreciate any
feedback/suggestions that could make it better for others.

I also noticed that you added additional flags to the page header.
Initially we were thinking about something like this, but decided that
the fork files are better for any encryption (or other storage
related) extra data. These few bits try to be generic, while also
restrictive because of the limited amount of data. (and that data is
specifically per page, if I want something per file or per page range,
I still need a custom solution)

Regarding the WAL encryption part, we took a completely different
approach, similar to how we handle normal table data (page-based). I
will need to think more about this before I can provide meaningful
feedback on that part of the patch. One initial question, however, is
whether you have run detailed benchmarks with different workloads.
That seems to be the trickiest part there, since most of the code runs
in a critical section. (Not the "unused"/"empty hook" path, but the
overhead caused by a real encryption plugin using this hook in
practice)


[1]: https://github.com/percona/pg_tde

pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Next
From: Marcos Pegoraro
Date:
Subject: Re: Get rid of "Section.N.N.N" on DOCs