Thread: Unified File API
Background
==========
PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines.
PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet,
BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen,
Smgr_open,
On the downside, “amazing variety” also means somewhat confusing and difficult to add new features.
Someday, we’d like to add encryption or compression to the various PostgreSql files.
To do that, we need to bring all the relevant files into a common file API where we can implement
the new features.
Goals of Patch
=============
1)Unify file access so most of “the other” files can go through a common interface, allowing new features
like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t
change the logic of current code. 3)Convert a reasonable set of callers to use the new interface.
Note the focus is on the “other” files. The buffer cache and the WAL have similar needs,
but they are being done in a separate project. (yes, the two projects are coordinating)
Patch 0001. Create a common file API.
===============================
Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open),
2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile).
Of these three, virtual file descriptors (VFDs) are the most common. They are also the
only funnel which is implemented by PostgresSql.
Decision: Choose VFDs as the common interface.
Problem: VFDs are random access only.
Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND)
Problem: VFDs have minimal error handling (based on errno.)
Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg)
Problem: Must maintain compatibility with existing error handling code.
Solution: save and restore errno to minimize changes to existing code.
Patch 0002. Update code to use the common file API
===========================================
The second patch alters callers so they use VFDs rather than system or C library files.
It doesn’t modify all callers, but it does capture many of the files which need
to be encrypted or compressed. This is definitely WIP.
Future (not too far away)
=====================
Looking ahead, there will be another set of patches which inject buffering and encryption into
the VFD interface. The future patches will build on the current work and introduce new “oflags”
to enable encryption and buffering.
Compression is also a possibility, but currently lower priority and a bit tricky for random access files.
Let us know if you have a use case.
Attachment
On Thu, 29 Jun 2023 at 13:20, John Morris <john.morris@crunchydata.com> wrote: > > Background > > ========== > > PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines. > PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet, > > BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen, > > Smgr_open, > > > > On the downside, “amazing variety” also means somewhat confusing and difficult to add new features. > Someday, we’d like to add encryption or compression to the various PostgreSql files. > To do that, we need to bring all the relevant files into a common file API where we can implement > the new features. > > > > Goals of Patch > > ============= > > 1)Unify file access so most of “the other” files can go through a common interface, allowing new features > like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t > change the logic of current code. 3)Convert a reasonable set of callers to use the new interface. > > > > Note the focus is on the “other” files. The buffer cache and the WAL have similar needs, > but they are being done in a separate project. (yes, the two projects are coordinating) > > Patch 0001. Create a common file API. > > =============================== > > Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open), > 2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile). > Of these three, virtual file descriptors (VFDs) are the most common. They are also the > only funnel which is implemented by PostgresSql. > > > > Decision: Choose VFDs as the common interface. > > > > Problem: VFDs are random access only. > > Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND) > > > > Problem: VFDs have minimal error handling (based on errno.) > > Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg) > > > > Problem: Must maintain compatibility with existing error handling code. > > Solution: save and restore errno to minimize changes to existing code. > > > > Patch 0002. Update code to use the common file API > > =========================================== > > The second patch alters callers so they use VFDs rather than system or C library files. > It doesn’t modify all callers, but it does capture many of the files which need > to be encrypted or compressed. This is definitely WIP. > > > > Future (not too far away) > > ===================== > > Looking ahead, there will be another set of patches which inject buffering and encryption into > the VFD interface. The future patches will build on the current work and introduce new “oflags” > > to enable encryption and buffering. > > > Compression is also a possibility, but currently lower priority and a bit tricky for random access files. > Let us know if you have a use case. CFbot shows few compilation warnings/error at [1]: [15:54:06.825] ../src/backend/storage/file/fd.c:2420:11: warning: unused variable 'save_errno' [-Wunused-variable] [15:54:06.825] int ret, save_errno; [15:54:06.825] ^ [15:54:06.825] ../src/backend/storage/file/fd.c:4026:29: error: use of undeclared identifier 'MAXIMUM_VFD' [15:54:06.825] Assert(file >= 0 && file < MAXIMUM_VFD); [15:54:06.825] ^ [15:54:06.825] 1 warning and 1 error generated. [1] - https://cirrus-ci.com/task/6552527404007424 Regards, Vignesh
On Sat, 6 Jan 2024 at 22:58, vignesh C <vignesh21@gmail.com> wrote: > > On Thu, 29 Jun 2023 at 13:20, John Morris <john.morris@crunchydata.com> wrote: > > > > Background > > > > ========== > > > > PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines. > > PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet, > > > > BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen, > > > > Smgr_open, > > > > > > > > On the downside, “amazing variety” also means somewhat confusing and difficult to add new features. > > Someday, we’d like to add encryption or compression to the various PostgreSql files. > > To do that, we need to bring all the relevant files into a common file API where we can implement > > the new features. > > > > > > > > Goals of Patch > > > > ============= > > > > 1)Unify file access so most of “the other” files can go through a common interface, allowing new features > > like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t > > change the logic of current code. 3)Convert a reasonable set of callers to use the new interface. > > > > > > > > Note the focus is on the “other” files. The buffer cache and the WAL have similar needs, > > but they are being done in a separate project. (yes, the two projects are coordinating) > > > > Patch 0001. Create a common file API. > > > > =============================== > > > > Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open), > > 2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile). > > Of these three, virtual file descriptors (VFDs) are the most common. They are also the > > only funnel which is implemented by PostgresSql. > > > > > > > > Decision: Choose VFDs as the common interface. > > > > > > > > Problem: VFDs are random access only. > > > > Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND) > > > > > > > > Problem: VFDs have minimal error handling (based on errno.) > > > > Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg) > > > > > > > > Problem: Must maintain compatibility with existing error handling code. > > > > Solution: save and restore errno to minimize changes to existing code. > > > > > > > > Patch 0002. Update code to use the common file API > > > > =========================================== > > > > The second patch alters callers so they use VFDs rather than system or C library files. > > It doesn’t modify all callers, but it does capture many of the files which need > > to be encrypted or compressed. This is definitely WIP. > > > > > > > > Future (not too far away) > > > > ===================== > > > > Looking ahead, there will be another set of patches which inject buffering and encryption into > > the VFD interface. The future patches will build on the current work and introduce new “oflags” > > > > to enable encryption and buffering. > > > > > > Compression is also a possibility, but currently lower priority and a bit tricky for random access files. > > Let us know if you have a use case. > > CFbot shows few compilation warnings/error at [1]: > [15:54:06.825] ../src/backend/storage/file/fd.c:2420:11: warning: > unused variable 'save_errno' [-Wunused-variable] > [15:54:06.825] int ret, save_errno; > [15:54:06.825] ^ > [15:54:06.825] ../src/backend/storage/file/fd.c:4026:29: error: use of > undeclared identifier 'MAXIMUM_VFD' > [15:54:06.825] Assert(file >= 0 && file < MAXIMUM_VFD); > [15:54:06.825] ^ > [15:54:06.825] 1 warning and 1 error generated. With no update to the thread and the compilation still failing I'm marking this as returned with feedback. Please feel free to resubmit to the next CF when there is a new version of the patch. Regards, Vignesh