Thread: Temporary file access API

Temporary file access API

From
Antonin Houska
Date:
Here I'm starting a new thread to discuss a topic that's related to the
Transparent Data Encryption (TDE), but could be useful even without that.  The
problem has been addressed somehow in the Cybertec TDE fork, and I can post
the code here if it helps. However, after reading [1] (and the posts
upthread), I've got another idea, so let's try to discuss it first.

It makes sense to me if we first implement the buffering (i.e. writing/reading
certain amount of data at a time) and make the related functions aware of
encryption later: as long as we use a block cipher, we also need to read/write
(suitably sized) chunks rather than individual bytes (or arbitrary amounts of
data). (In theory, someone might need encryption but reject buffering, but I'm
not sure if this is a realistic use case.)

For the buffering, I imagine a "file stream" object that user creates on the
top of a file descriptor, such as

    FileStream  *FileStreamCreate(File file, int buffer_size)

    or

    FileStream  *FileStreamCreateFD(int fd, int buffer_size)

and uses functions like

    int FileStreamWrite(FileStream *stream, char *buffer, int amount)

    and

    int FileStreamRead(FileStream *stream, char *buffer, int amount)

to write and read data respectively.

Besides functions to close the streams explicitly (e.g. FileStreamClose() /
FileStreamFDClose()), we'd need to ensure automatic closing where that happens
to the file. For example, if OpenTemporaryFile() was used to obtain the file
descriptor, the user expects that the file will be closed and deleted on
transaction boundary, so the corresponding stream should be freed
automatically as well.

To avoid code duplication, buffile.c should use these streams internally as
well, as it also performs buffering. (Here we'd also need functions to change
reading/writing position.)

Once we implement the encryption, we might need add an argument to the
FileStreamCreate...() functions that helps to generate an unique IV, but the
...Read() / ...Write() functions would stay intact. And possibly one more
argument to specify the kind of cipher, in case we support more than one.

I think that's enough to start the discussion. Thanks for feedback in advance.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYGjN_f%3DFCErX49bzjhNG%2BGoctY%2Ba%2BXhNRWCVvDY8U74w%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Stephen Frost
Date:
Greetings,

* Antonin Houska (ah@cybertec.at) wrote:
> Here I'm starting a new thread to discuss a topic that's related to the
> Transparent Data Encryption (TDE), but could be useful even without that.  The
> problem has been addressed somehow in the Cybertec TDE fork, and I can post
> the code here if it helps. However, after reading [1] (and the posts
> upthread), I've got another idea, so let's try to discuss it first.

Yeah, I tend to agree that it makes sense to discuss the general idea of
doing buffered I/O for the temporary files that are being created today
with things like fwrite and have that be independent of encryption.  As
mentioned on that thread, doing our own buffering and having file
accesses done the same way throughout the system is just generally a
good thing and focusing on that will certainly help with any TDE effort
that may or may not happen later.

> It makes sense to me if we first implement the buffering (i.e. writing/reading
> certain amount of data at a time) and make the related functions aware of
> encryption later: as long as we use a block cipher, we also need to read/write
> (suitably sized) chunks rather than individual bytes (or arbitrary amounts of
> data). (In theory, someone might need encryption but reject buffering, but I'm
> not sure if this is a realistic use case.)

Agreed.

> For the buffering, I imagine a "file stream" object that user creates on the
> top of a file descriptor, such as
>
>     FileStream  *FileStreamCreate(File file, int buffer_size)
>
>     or
>
>     FileStream  *FileStreamCreateFD(int fd, int buffer_size)
>
> and uses functions like
>
>     int FileStreamWrite(FileStream *stream, char *buffer, int amount)
>
>     and
>
>     int FileStreamRead(FileStream *stream, char *buffer, int amount)
>
> to write and read data respectively.

Yeah, something along these lines was also what I had in mind, in
particular as it would mean fewer changes to places like reorderbuffer.c
(or, at least, very clear changes).

> Besides functions to close the streams explicitly (e.g. FileStreamClose() /
> FileStreamFDClose()), we'd need to ensure automatic closing where that happens
> to the file. For example, if OpenTemporaryFile() was used to obtain the file
> descriptor, the user expects that the file will be closed and deleted on
> transaction boundary, so the corresponding stream should be freed
> automatically as well.

Sure.  We do have some places that want to write using offsets and we'll
want to support that also, I'd think.

> To avoid code duplication, buffile.c should use these streams internally as
> well, as it also performs buffering. (Here we'd also need functions to change
> reading/writing position.)

Yeah... though we should really go through and make sure that we
understand the use-cases for each of the places that decided to use
their own code rather than what already existed, to the extent that we
can figure that out, and make sure that we're solving those cases and
not writing a bunch of new code that won't end up getting used.  We
really want to be sure that all writes are going through these code
paths and make sure there aren't any reasons for them not to be used.

> Once we implement the encryption, we might need add an argument to the
> FileStreamCreate...() functions that helps to generate an unique IV, but the
> ...Read() / ...Write() functions would stay intact. And possibly one more
> argument to specify the kind of cipher, in case we support more than one.

As long as we're only needing to pass these into the Create function,
that seems reasonable to me.  I wouldn't want to go through the effort
of this and then still have to change every single place we read/write
using this system.

Thanks,

Stephen

Attachment

Re: Temporary file access API

From
Antonin Houska
Date:
Stephen Frost <sfrost@snowman.net> wrote:

> * Antonin Houska (ah@cybertec.at) wrote:
> > Here I'm starting a new thread to discuss a topic that's related to the
> > Transparent Data Encryption (TDE), but could be useful even without that.  The
> > problem has been addressed somehow in the Cybertec TDE fork, and I can post
> > the code here if it helps. However, after reading [1] (and the posts
> > upthread), I've got another idea, so let's try to discuss it first.
> 
> Yeah, I tend to agree that it makes sense to discuss the general idea of
> doing buffered I/O for the temporary files that are being created today
> with things like fwrite and have that be independent of encryption.  As
> mentioned on that thread, doing our own buffering and having file
> accesses done the same way throughout the system is just generally a
> good thing and focusing on that will certainly help with any TDE effort
> that may or may not happen later.

Thanks for your comments, the initial version is attached here.

The buffer size is not configurable yet. Besides the new API itself, the patch
series shows how it can be used to store logical replication data changes as
well as statistics. Further comments are welcome, thanks in advance.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: Temporary file access API

From
Robert Haas
Date:
On Tue, Mar 8, 2022 at 6:12 AM Antonin Houska <ah@cybertec.at> wrote:
> Thanks for your comments, the initial version is attached here.

I've been meaning to look at this thread for some time but have not
found enough time to do that until just now. And now I have
questions...

1. Supposing we accepted this, how widely do you think that we could
adopt it? I see that this patch set adapts a few places to use it and
that's nice, but I have a feeling there's a lot more places that are
making use of system calls directly, or through some other
abstraction, than just this. I'm not sure that we actually want to use
something like this everywhere, but what would be the rule for
deciding where to use it and where not to use
it? If the plan for this facility is just to adapt these two
particular places to use it, that doesn't feel like enough to be
worthwhile.

2. What about frontend code? Most frontend code won't examine data
files directly, but at least pg_controldata, pg_checksums, and
pg_resetwal are exceptions.

3. How would we extend this to support encryption? Add an extra
argument to BufFileStreamInit(V)FD passing down the encryption
details?

There are some smaller things about the patch with which I'm not 100%
comfortable, but I'd like to start by understanding the big picture.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Mar 8, 2022 at 6:12 AM Antonin Houska <ah@cybertec.at> wrote:
> > Thanks for your comments, the initial version is attached here.
>
> I've been meaning to look at this thread for some time but have not
> found enough time to do that until just now. And now I have
> questions...
>
> 1. Supposing we accepted this, how widely do you think that we could
> adopt it? I see that this patch set adapts a few places to use it and
> that's nice, but I have a feeling there's a lot more places that are
> making use of system calls directly, or through some other
> abstraction, than just this. I'm not sure that we actually want to use
> something like this everywhere, but what would be the rule for
> deciding where to use it and where not to use
> it? If the plan for this facility is just to adapt these two
> particular places to use it, that doesn't feel like enough to be
> worthwhile.

Admittedly I viewed the problem from the perspective of the TDE, so I haven't
spent much time looking for other opportunities. Now, with the stats collector
using shared memory, even one of the use cases implemented here no longer
exists. I need to do more research.

Do you think that the use of a system call is a problem itself (e.g. because
the code looks less simple if read/write is used somewhere and fread/fwrite
elsewhere; of course of read/write is mandatory in special cases like WAL,
heap pages, etc.)  or is the problem that the system calls are used too
frequently? I suppose only the latter.

Anyway, I'm not sure there are *many* places where system calls are used too
frequently. Instead, the coding uses to be such that the information is first
assembled in memory and then written to file at once. So the value of the
(buffered) stream is that it makes the code simpler (eliminates the need to
prepare the data in memory). That's what I tried to do for reorderbuffer.c and
pgstat.c in my patch.

Related question is whether we should try to replace some uses of the libc
stream (FILE *) at some places. You seem to suggest that in [1]. One example
is snapmgr.c:ExportSnapshot(), if we also implement output formatting. Of
course there are places where (FILE *) cannot be replaced because, besides
regular file, the code needs to work with stdin/stdout in general. (Parsing of
configuration files falls into this category, but that doesn't matter because
bison-generated parser seems to implement buffering anyway.)

> 2. What about frontend code? Most frontend code won't examine data
> files directly, but at least pg_controldata, pg_checksums, and
> pg_resetwal are exceptions.

If the frequency of using system calls is the problem, then I wouldn't change
these because ControlFileData structure needs to be initialized in memory
anyway and then written at once. And pg_checksums reads whole blocks
anyway. I'll take a closer look.

> 3. How would we extend this to support encryption? Add an extra
> argument to BufFileStreamInit(V)FD passing down the encryption
> details?

Yes.

> There are some smaller things about the patch with which I'm not 100%
> comfortable, but I'd like to start by understanding the big picture.

Thanks!

[1] https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Robert Haas
Date:
On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska <ah@cybertec.at> wrote:
> Do you think that the use of a system call is a problem itself (e.g. because
> the code looks less simple if read/write is used somewhere and fread/fwrite
> elsewhere; of course of read/write is mandatory in special cases like WAL,
> heap pages, etc.)  or is the problem that the system calls are used too
> frequently? I suppose only the latter.

I'm not really super-interested in reducing the number of system
calls. It's not a dumb thing in which to be interested and I know that
for example Thomas Munro is very interested in it and has done a bunch
of work in that direction just to improve performance. But for me the
attraction of this is mostly whether it gets us closer to TDE.

And that's why I'm asking these questions about adopting it in
different places. I kind of thought that your previous patches needed
to encrypt, I don't know, 10 or 20 different kinds of files. So I was
surprised to see this patch touching the handling of only 2 kinds of
files. If we consolidate the handling of let's say 15 of 20 cases into
a single mechanism, we've really moved the needle in the right
direction -- but consolidating the handling of 2 of 20 cases, or
whatever the real numbers are, isn't very exciting.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska <ah@cybertec.at> wrote:
> > Do you think that the use of a system call is a problem itself (e.g. because
> > the code looks less simple if read/write is used somewhere and fread/fwrite
> > elsewhere; of course of read/write is mandatory in special cases like WAL,
> > heap pages, etc.)  or is the problem that the system calls are used too
> > frequently? I suppose only the latter.
>
> I'm not really super-interested in reducing the number of system
> calls. It's not a dumb thing in which to be interested and I know that
> for example Thomas Munro is very interested in it and has done a bunch
> of work in that direction just to improve performance. But for me the
> attraction of this is mostly whether it gets us closer to TDE.
>
> And that's why I'm asking these questions about adopting it in
> different places. I kind of thought that your previous patches needed
> to encrypt, I don't know, 10 or 20 different kinds of files. So I was
> surprised to see this patch touching the handling of only 2 kinds of
> files. If we consolidate the handling of let's say 15 of 20 cases into
> a single mechanism, we've really moved the needle in the right
> direction -- but consolidating the handling of 2 of 20 cases, or
> whatever the real numbers are, isn't very exciting.

There are't really that many kinds of files to encrypt:

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data

(And pg_stat/* files should be removed from the list.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Robert Haas
Date:
On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> There are't really that many kinds of files to encrypt:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
>
> (And pg_stat/* files should be removed from the list.)

This kind of gets into some theoretical questions. Like, do we think
that it's an information leak if people can look at how many
transactions are committing and aborting in pg_xact_status? In theory
it could be, but I know it's been argued that that's too much of a
side channel. I'm not sure I believe that, but it's arguable.
Similarly, the argument that global/pg_internal.init doesn't contain
user data relies on the theory that the only table data that will make
its way into the file is for system catalogs. I guess that's not user
data *exactly* but ... are we sure that's how we want to roll here?

I really don't know how you can argue that pg_dynshmem/mmap.NNNNNNN
doesn't contain user data - surely it can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > There are't really that many kinds of files to encrypt:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> >
> > (And pg_stat/* files should be removed from the list.)
>
> This kind of gets into some theoretical questions. Like, do we think
> that it's an information leak if people can look at how many
> transactions are committing and aborting in pg_xact_status? In theory
> it could be, but I know it's been argued that that's too much of a
> side channel. I'm not sure I believe that, but it's arguable.

I was referring to the fact that the statistics are no longer stored in files:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd

> Similarly, the argument that global/pg_internal.init doesn't contain
> user data relies on the theory that the only table data that will make
> its way into the file is for system catalogs. I guess that's not user
> data *exactly* but ... are we sure that's how we want to roll here?

Yes, this is worth attention.

> I really don't know how you can argue that pg_dynshmem/mmap.NNNNNNN
> doesn't contain user data - surely it can.

Good point. Since postgres does not control writing into this file, it's a
special case though. (Maybe TDE will have to reject to start if
dynamic_shared_memory_type is set to mmap and the instance is encrypted.)

Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska <ah@cybertec.at> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > > There are't really that many kinds of files to encrypt:
> > >
> > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > >
> > > (And pg_stat/* files should be removed from the list.)
> >
> > This kind of gets into some theoretical questions. Like, do we think
> > that it's an information leak if people can look at how many
> > transactions are committing and aborting in pg_xact_status? In theory
> > it could be, but I know it's been argued that that's too much of a
> > side channel. I'm not sure I believe that, but it's arguable.
>
> I was referring to the fact that the statistics are no longer stored in files:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd

Oh, yeah, I agree with that. I was saying that I'm not altogether a
believer in the idea that we need not encrypt SLRU files.

TBH, I think that no matter what we do there are going to be side
channel leaks, where some security researcher demonstrates that they
can infer something based on file length or file creation rate or
something. It seems inevitable. But the more stuff we don't encrypt,
the easier those attacks are going to be. On the other hand,
encrypting more stuff also makes the project harder. It's hard for me
to judge what the right balance is here. Maybe it's OK to exclude that
stuff for an initial version and just disclaim the heck out of it, but
I don't think that should be our long term strategy.

> > I really don't know how you can argue that pg_dynshmem/mmap.NNNNNNN
> > doesn't contain user data - surely it can.
>
> Good point. Since postgres does not control writing into this file, it's a
> special case though. (Maybe TDE will have to reject to start if
> dynamic_shared_memory_type is set to mmap and the instance is encrypted.)

Interesting point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Matthias van de Meent
Date:
On Mon, 11 Apr 2022 at 10:05, Antonin Houska <ah@cybertec.at> wrote:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
> > On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Do you think that the use of a system call is a problem itself (e.g. because
> > > the code looks less simple if read/write is used somewhere and fread/fwrite
> > > elsewhere; of course of read/write is mandatory in special cases like WAL,
> > > heap pages, etc.)  or is the problem that the system calls are used too
> > > frequently? I suppose only the latter.
> >
> > I'm not really super-interested in reducing the number of system
> > calls. It's not a dumb thing in which to be interested and I know that
> > for example Thomas Munro is very interested in it and has done a bunch
> > of work in that direction just to improve performance. But for me the
> > attraction of this is mostly whether it gets us closer to TDE.
> >
> > And that's why I'm asking these questions about adopting it in
> > different places. I kind of thought that your previous patches needed
> > to encrypt, I don't know, 10 or 20 different kinds of files. So I was
> > surprised to see this patch touching the handling of only 2 kinds of
> > files. If we consolidate the handling of let's say 15 of 20 cases into
> > a single mechanism, we've really moved the needle in the right
> > direction -- but consolidating the handling of 2 of 20 cases, or
> > whatever the real numbers are, isn't very exciting.
>
> There are't really that many kinds of files to encrypt:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
>
> (And pg_stat/* files should be removed from the list.)

I was looking at that list of files that contain user data, and
noticed that all relation forks except the main fork were marked as
'does not contain user data'. To me this seems not necessarily true:
AMs do have access to forks for user data storage as well (without any
real issues or breaking the abstraction), and the init-fork is
expected to store user data (specifically in the form of unlogged
sequences). Shouldn't those forks thus also be encrypted-by-default,
or should we provide some other method to ensure that non-main forks
with user data are encrypted?


- Matthias



Re: Temporary file access API

From
Antonin Houska
Date:
Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

> On Mon, 11 Apr 2022 at 10:05, Antonin Houska <ah@cybertec.at> wrote:
> >
> > There are't really that many kinds of files to encrypt:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> >
> > (And pg_stat/* files should be removed from the list.)
>
> I was looking at that list of files that contain user data, and
> noticed that all relation forks except the main fork were marked as
> 'does not contain user data'. To me this seems not necessarily true:
> AMs do have access to forks for user data storage as well (without any
> real issues or breaking the abstraction), and the init-fork is
> expected to store user data (specifically in the form of unlogged
> sequences). Shouldn't those forks thus also be encrypted-by-default,
> or should we provide some other method to ensure that non-main forks
> with user data are encrypted?

Thanks. I've updated the wiki page (also included Robert's comments).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska <ah@cybertec.at> wrote:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > > > There are't really that many kinds of files to encrypt:
> > > >
> > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > > >
> > > > (And pg_stat/* files should be removed from the list.)
> > >
> > > This kind of gets into some theoretical questions. Like, do we think
> > > that it's an information leak if people can look at how many
> > > transactions are committing and aborting in pg_xact_status? In theory
> > > it could be, but I know it's been argued that that's too much of a
> > > side channel. I'm not sure I believe that, but it's arguable.
> >
> > I was referring to the fact that the statistics are no longer stored in files:
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd
>
> Oh, yeah, I agree with that. I was saying that I'm not altogether a
> believer in the idea that we need not encrypt SLRU files.
>
> TBH, I think that no matter what we do there are going to be side
> channel leaks, where some security researcher demonstrates that they
> can infer something based on file length or file creation rate or
> something. It seems inevitable. But the more stuff we don't encrypt,
> the easier those attacks are going to be. On the other hand,
> encrypting more stuff also makes the project harder. It's hard for me
> to judge what the right balance is here. Maybe it's OK to exclude that
> stuff for an initial version and just disclaim the heck out of it, but
> I don't think that should be our long term strategy.

I agree that there's undoubtably going to be side-channel attacks, some
of which we may never be able to address, but we should be looking to
try and do as much as we can while also disclaiming that which we can't.

I'd lay this out in steps along these lines:

- Primary data is encrypted (and, ideally, optionally authenticated)

This is basically what the current state of things are with the
patches that have been posted in the past and would be a fantastic first
step that would get us past a lot of the auditors and others who are
unhappy that they can simply 'grep' a PG data directory for user data.
This also generally addresses off-line attacks, backups, etc.  This also
puts us on a similar level as other vendors who offer TDE, as I
understand it.

- Encryption / authentication of metadata (SLRUs, maybe more)

This would be a great next step and would move us in the direction of
having a system that's resiliant against online storage-level attacks.
As for how to get there, I would think we'd start with coming up with a
way to at least have good checksums for SLRUs that are just generally
available and users are encouraged to enable, and then have that done in
a way that we could store an authentication tag instead in the future.

> > > I really don't know how you can argue that pg_dynshmem/mmap.NNNNNNN
> > > doesn't contain user data - surely it can.
> >
> > Good point. Since postgres does not control writing into this file, it's a
> > special case though. (Maybe TDE will have to reject to start if
> > dynamic_shared_memory_type is set to mmap and the instance is encrypted.)
>
> Interesting point.

Yeah, this is an interesting case and it really depends on what attack
vector is being considered and how the system is configured.  I don't
think it's too much of an issue to say that you shouldn't use TDE with
mmap (I'm not huge on the idea of explicitly preventing someone from
doing it if they really want though..).

Thanks,

Stephen

Attachment

Re: Temporary file access API

From
Bruce Momjian
Date:
On Mon, Apr 11, 2022 at 04:34:18PM -0400, Robert Haas wrote:
> On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > There are't really that many kinds of files to encrypt:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> >
> > (And pg_stat/* files should be removed from the list.)
> 
> This kind of gets into some theoretical questions. Like, do we think
> that it's an information leak if people can look at how many
> transactions are committing and aborting in pg_xact_status? In theory
> it could be, but I know it's been argued that that's too much of a
> side channel. I'm not sure I believe that, but it's arguable.
> Similarly, the argument that global/pg_internal.init doesn't contain
> user data relies on the theory that the only table data that will make
> its way into the file is for system catalogs. I guess that's not user
> data *exactly* but ... are we sure that's how we want to roll here?

I don't think we want to be encrypting pg_xact/, so they can get the
transaction commit rate from there.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




Re: Temporary file access API

From
Robert Haas
Date:
On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian <bruce@momjian.us> wrote:
> I don't think we want to be encrypting pg_xact/, so they can get the
> transaction commit rate from there.

I think it would be a good idea to eventually encrypt SLRU data,
including pg_xact. Maybe not in the first version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Stephen Frost
Date:
Greetings,

On Wed, Apr 13, 2022 at 18:54 Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian <bruce@momjian.us> wrote:
> I don't think we want to be encrypting pg_xact/, so they can get the
> transaction commit rate from there.

I think it would be a good idea to eventually encrypt SLRU data,
including pg_xact. Maybe not in the first version.

I agree with Robert, on both parts.

I do think getting checksums for that data may be the first sensible step.

Thanks,

Stephen

Re: Temporary file access API

From
Bruce Momjian
Date:
On Wed, Apr 13, 2022 at 06:54:13PM -0400, Robert Haas wrote:
> On Wed, Apr 13, 2022 at 6:25 PM Bruce Momjian <bruce@momjian.us> wrote:
> > I don't think we want to be encrypting pg_xact/, so they can get the
> > transaction commit rate from there.
> 
> I think it would be a good idea to eventually encrypt SLRU data,
> including pg_xact. Maybe not in the first version.

I assume that would be very hard or slow, and of limited value.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




Re: Temporary file access API

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska <ah@cybertec.at> wrote:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > > > There are't really that many kinds of files to encrypt:
> > > >
> > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > > >
> > > > (And pg_stat/* files should be removed from the list.)
> > >
> > > This kind of gets into some theoretical questions. Like, do we think
> > > that it's an information leak if people can look at how many
> > > transactions are committing and aborting in pg_xact_status? In theory
> > > it could be, but I know it's been argued that that's too much of a
> > > side channel. I'm not sure I believe that, but it's arguable.
> >
> > I was referring to the fact that the statistics are no longer stored in files:
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd
> 
> Oh, yeah, I agree with that.

I see now that the statistics are yet saved to a file on server shutdown. I've
updated the wiki page.

Attached is a new version of the patch, to evaluate what the API use in the
backend could look like. I haven't touched places where the file is accessed
in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
called.

Another use case might be copying one file to another via a buffer. Something
like

    BufFileCopy(int dstfd, int srcfd, int bufsize)

The obvious call site would be in copydir.c:copy_file(), but I think there are
a few more in the server code.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: Temporary file access API

From
Antonin Houska
Date:
Antonin Houska <ah@cybertec.at> wrote:

> Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska <ah@cybertec.at> wrote:
> > > Robert Haas <robertmhaas@gmail.com> wrote:
> > > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska <ah@cybertec.at> wrote:
> > > > > There are't really that many kinds of files to encrypt:
> > > > >
> > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > > > >
> > > > > (And pg_stat/* files should be removed from the list.)
> > > >
> > > > This kind of gets into some theoretical questions. Like, do we think
> > > > that it's an information leak if people can look at how many
> > > > transactions are committing and aborting in pg_xact_status? In theory
> > > > it could be, but I know it's been argued that that's too much of a
> > > > side channel. I'm not sure I believe that, but it's arguable.
> > >
> > > I was referring to the fact that the statistics are no longer stored in files:
> > >
> > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd
> > 
> > Oh, yeah, I agree with that.
> 
> I see now that the statistics are yet saved to a file on server shutdown. I've
> updated the wiki page.
> 
> Attached is a new version of the patch, to evaluate what the API use in the
> backend could look like. I haven't touched places where the file is accessed
> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
> called.

Rebased patch set is attached here, which applies to the current master.

(A few more opportunities for the new API implemented here.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: Temporary file access API

From
Peter Eisentraut
Date:
On 04.07.22 18:35, Antonin Houska wrote:
>> Attached is a new version of the patch, to evaluate what the API use in the
>> backend could look like. I haven't touched places where the file is accessed
>> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
>> called.
> Rebased patch set is attached here, which applies to the current master.
> 
> (A few more opportunities for the new API implemented here.)

I don't understand what this patch set is supposed to do.  AFAICT, the 
thread originally forked off from a TDE discussion and, considering the 
thread subject, was possibly once an attempt to refactor temporary file 
access to make integrating encryption easier?  The discussion then 
talked about things like saving on system calls and excising all 
OS-level file access API use, which I found confusing, and the thread 
then just became a general TDE-related mini-discussion.

The patches at hand extend some internal file access APIs and then 
sprinkle them around various places in the backend code, but there is no 
explanation why or how this is better.  I don't see any real structural 
savings one might expect from a refactoring patch.  No information has 
been presented how this might help encryption.

I also suspect that changing around the use of various file access APIs 
needs to be carefully evaluated in each individual case.  Various code 
has subtle expectations about atomic write behavior, syncing, flushing, 
error recovery, etc.  I don't know if this has been considered here.



Re: Temporary file access API

From
Antonin Houska
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

> On 04.07.22 18:35, Antonin Houska wrote:
> >> Attached is a new version of the patch, to evaluate what the API use in the
> >> backend could look like. I haven't touched places where the file is accessed
> >> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
> >> called.
> > Rebased patch set is attached here, which applies to the current master.
> > (A few more opportunities for the new API implemented here.)
>
> I don't understand what this patch set is supposed to do.  AFAICT, the thread
> originally forked off from a TDE discussion and, considering the thread
> subject, was possibly once an attempt to refactor temporary file access to
> make integrating encryption easier?  The discussion then talked about things
> like saving on system calls and excising all OS-level file access API use,
> which I found confusing, and the thread then just became a general TDE-related
> mini-discussion.

Yes, it's an attempt to make the encryption less invasive, but there are a few
other objectives, at least: 1) to make the read / write operations "less
low-level" (there are common things like error handling which are often just
copy & pasted from other places), 2) to have buffered I/O with configurable
buffer size (the current patch version still has fixed buffer size though)

It's true that the discussion ends up to be encryption-specific, however the
scope of the patch is broader. The first meassage of the thread references a
related discussion

https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

which is more important for this patch than the suggestions about encryption.

> The patches at hand extend some internal file access APIs and then sprinkle
> them around various places in the backend code, but there is no explanation
> why or how this is better. I don't see any real structural savings one might
> expect from a refactoring patch.  No information has been presented how this
> might help encryption.

At this stage I expected feedback from the developers who have already
contributed to the discussion, because I'm not sure myself if this version
fits their ideas - that's why I didn't elaborate the documentation yet. I'll
try to summarize my understanding in the next version, but I'd appreciate if I
got some feedback for the current version first.

> I also suspect that changing around the use of various file access APIs needs
> to be carefully evaluated in each individual case.  Various code has subtle
> expectations about atomic write behavior, syncing, flushing, error recovery,
> etc.  I don't know if this has been considered here.

I considered that, but could have messed up at some places. Right now I'm
aware of one problem: pgstat.c does not expect the file access API to raise
ERROR - this needs to be fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Antonin Houska
Date:
Antonin Houska <ah@cybertec.at> wrote:

> Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> 
> > On 04.07.22 18:35, Antonin Houska wrote:
> > >> Attached is a new version of the patch, to evaluate what the API use in the
> > >> backend could look like. I haven't touched places where the file is accessed
> > >> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
> > >> called.
> > > Rebased patch set is attached here, which applies to the current master.
> > > (A few more opportunities for the new API implemented here.)
> > 
> > I don't understand what this patch set is supposed to do.  AFAICT, the thread
> > originally forked off from a TDE discussion and, considering the thread
> > subject, was possibly once an attempt to refactor temporary file access to
> > make integrating encryption easier?  The discussion then talked about things
> > like saving on system calls and excising all OS-level file access API use,
> > which I found confusing, and the thread then just became a general TDE-related
> > mini-discussion.
> 
> Yes, it's an attempt to make the encryption less invasive, but there are a few
> other objectives, at least: 1) to make the read / write operations "less
> low-level" (there are common things like error handling which are often just
> copy & pasted from other places), 2) to have buffered I/O with configurable
> buffer size (the current patch version still has fixed buffer size though)
> 
> It's true that the discussion ends up to be encryption-specific, however the
> scope of the patch is broader. The first meassage of the thread references a
> related discussion
> 
> https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com
> 
> which is more important for this patch than the suggestions about encryption.
> 
> > The patches at hand extend some internal file access APIs and then sprinkle
> > them around various places in the backend code, but there is no explanation
> > why or how this is better. I don't see any real structural savings one might
> > expect from a refactoring patch.  No information has been presented how this
> > might help encryption.
> 
> At this stage I expected feedback from the developers who have already
> contributed to the discussion, because I'm not sure myself if this version
> fits their ideas - that's why I didn't elaborate the documentation yet. I'll
> try to summarize my understanding in the next version, but I'd appreciate if I
> got some feedback for the current version first.
> 
> > I also suspect that changing around the use of various file access APIs needs
> > to be carefully evaluated in each individual case.  Various code has subtle
> > expectations about atomic write behavior, syncing, flushing, error recovery,
> > etc.  I don't know if this has been considered here.
> 
> I considered that, but could have messed up at some places. Right now I'm
> aware of one problem: pgstat.c does not expect the file access API to raise
> ERROR - this needs to be fixed.

Attached is a new version. It allows the user to set "elevel" (i.e. ERROR is
not necessarily thrown on I/O failure, if the user prefers to check the number
of bytes read/written) and to specify the buffer size. Also, 0015 adds a
function to copy data from one file descriptor to another.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

Re: Temporary file access API

From
Robert Haas
Date:
On Fri, Jul 29, 2022 at 11:36 AM Antonin Houska <ah@cybertec.at> wrote:
> Attached is a new version. It allows the user to set "elevel" (i.e. ERROR is
> not necessarily thrown on I/O failure, if the user prefers to check the number
> of bytes read/written) and to specify the buffer size. Also, 0015 adds a
> function to copy data from one file descriptor to another.

I basically agree with Peter Eisentraut's earlier comment: "I don't
understand what this patch set is supposed to do." The intended
advantages, as documented in buffile.c, are:

+ * 1. Less code is needed to access the file.
+ *
+ * 2.. Buffering reduces the number of I/O system calls.
+ *
+ * 3. The buffer size can be controlled by the user. (The larger the buffer,
+ * the fewer I/O system calls are needed, but the more data needs to be
+ * written to the buffer before the user recognizes that the file access
+ * failed.)
+ *
+ * 4. It should make features like Transparent Data Encryption less invasive.

It does look to me like there is a modest reduction in the total
amount of code from these patches, so I do think they might be
achieving (1).

I'm not so sure about (2), though. A number of these places are cases
where we only do a single write to the file - like in patches 0010,
0011, 0012, and 0014, for example. And even in 0013, where we do
multiple I/Os, it makes no sense to introduce a second buffering
layer. We're reading from a file into a buffer several kilobytes in
size and then shipping the data out. The places where you want
buffering are places where we might be doing system calls for a few
bytes of I/O at a time, not places where we're already doing
relatively large I/Os. An extra layer of buffering probably makes
things slower, not faster.

I don't think that (3) is a separate advantage from (1) and (2), so I
don't have anything separate to say about it.

I find it really unclear whether, or how, it achieves (4). In general,
I don't think that the large number of raw calls to read() and write()
spread across the backend is a particularly good thing. TDE is an
example of a feature that might affect every byte we read or write,
but you can also imagine instrumentation and tracing facilities that
might want to be able to do something every time we do an I/O without
having to modify a zillion separate call sites. Such features can
certainly benefit from consolidation, but I suspect it isn't helpful
to treat every I/O in exactly the same way. For instance, let's say we
settle on a design where everything in the cluster is encrypted but,
just to make things simple for the sake of discussion, let's say there
are no stored nonces anywhere. Can we just route every I/O in the
backend through a wrapper layer that does encryption and decryption?

Probably not. I imagine postgresql.conf and pg_hba.conf aren't going
to be encrypted, for example, because they're intended to be edited by
users. And Bruce has previously proposed designs where the WAL would
be encrypted with a different key than the data files. Another idea,
which I kind of like, is to encrypt WAL on a record level rather than
a block level. Either way, you can't just encrypt every single byte
the same way. There are probably other relevant distinctions: some
data is temporary and need not survive a server restart or be shared
with other backends (except maybe in the case of parallel query) and
some data is permanent. Some data is already block-structured using
blocks of size BLCKSZ; some data is block-structured using some other
block size; and some data is not block-structured. Some data is in
standard-format pages that are part of the buffer pool and some data
isn't. I feel like some of those distinctions probably matter to TDE,
and maybe to other things that we might want to do.

For instance, to go back to my earlier comments, if the data is
already block-structured, we do not need to insert a second layer of
buffering; if it isn't, maybe we should, not just for TDE but for
other reasons as well. If the data is temporary, TDE might want to
encrypt it using a temporary key which is generated on the fly and
only stored in server memory, but if it's data that needs to be
readable after a server restart, we're going to need to use a key that
we'll still have access to in the future. Or maybe that particular
thing isn't relevant, but I just can't believe that every single I/O
needs exactly the same treatment, so there have got to be some
patterns here that do matter. And I can't really tell whether this
patch set has a theory about that and, if so, what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Antonin Houska
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Jul 29, 2022 at 11:36 AM Antonin Houska <ah@cybertec.at> wrote:
> > Attached is a new version. It allows the user to set "elevel" (i.e. ERROR is
> > not necessarily thrown on I/O failure, if the user prefers to check the number
> > of bytes read/written) and to specify the buffer size. Also, 0015 adds a
> > function to copy data from one file descriptor to another.
>
> I basically agree with Peter Eisentraut's earlier comment: "I don't
> understand what this patch set is supposed to do." The intended
> advantages, as documented in buffile.c, are:
>
> + * 1. Less code is needed to access the file.
> + *
> + * 2.. Buffering reduces the number of I/O system calls.
> + *
> + * 3. The buffer size can be controlled by the user. (The larger the buffer,
> + * the fewer I/O system calls are needed, but the more data needs to be
> + * written to the buffer before the user recognizes that the file access
> + * failed.)
> + *
> + * 4. It should make features like Transparent Data Encryption less invasive.
>
> It does look to me like there is a modest reduction in the total
> amount of code from these patches, so I do think they might be
> achieving (1).
>
> I'm not so sure about (2), though. A number of these places are cases
> where we only do a single write to the file - like in patches 0010,
> 0011, 0012, and 0014, for example. And even in 0013, where we do
> multiple I/Os, it makes no sense to introduce a second buffering
> layer. We're reading from a file into a buffer several kilobytes in
> size and then shipping the data out. The places where you want
> buffering are places where we might be doing system calls for a few
> bytes of I/O at a time, not places where we're already doing
> relatively large I/Os. An extra layer of buffering probably makes
> things slower, not faster.

The buffering seemed to me like a good opportunity to plug-in the encryption,
because the data needs to be encrypted in memory before it's written to
disk. Of course it'd be better to use the existing buffering for that than to
add another layer.

> I don't think that (3) is a separate advantage from (1) and (2), so I
> don't have anything separate to say about it.

I thought that the uncontrollable buffer size is one of the things you
complaint about in

https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

> I find it really unclear whether, or how, it achieves (4). In general,
> I don't think that the large number of raw calls to read() and write()
> spread across the backend is a particularly good thing. TDE is an
> example of a feature that might affect every byte we read or write,
> but you can also imagine instrumentation and tracing facilities that
> might want to be able to do something every time we do an I/O without
> having to modify a zillion separate call sites. Such features can
> certainly benefit from consolidation, but I suspect it isn't helpful
> to treat every I/O in exactly the same way. For instance, let's say we
> settle on a design where everything in the cluster is encrypted but,
> just to make things simple for the sake of discussion, let's say there
> are no stored nonces anywhere. Can we just route every I/O in the
> backend through a wrapper layer that does encryption and decryption?

> Probably not. I imagine postgresql.conf and pg_hba.conf aren't going
> to be encrypted, for example, because they're intended to be edited by
> users. And Bruce has previously proposed designs where the WAL would
> be encrypted with a different key than the data files. Another idea,
> which I kind of like, is to encrypt WAL on a record level rather than
> a block level. Either way, you can't just encrypt every single byte
> the same way. There are probably other relevant distinctions: some
> data is temporary and need not survive a server restart or be shared
> with other backends (except maybe in the case of parallel query) and
> some data is permanent. Some data is already block-structured using
> blocks of size BLCKSZ; some data is block-structured using some other
> block size; and some data is not block-structured. Some data is in
> standard-format pages that are part of the buffer pool and some data
> isn't. I feel like some of those distinctions probably matter to TDE,
> and maybe to other things that we might want to do.
>
> For instance, to go back to my earlier comments, if the data is
> already block-structured, we do not need to insert a second layer of
> buffering; if it isn't, maybe we should, not just for TDE but for
> other reasons as well. If the data is temporary, TDE might want to
> encrypt it using a temporary key which is generated on the fly and
> only stored in server memory, but if it's data that needs to be
> readable after a server restart, we're going to need to use a key that
> we'll still have access to in the future. Or maybe that particular
> thing isn't relevant, but I just can't believe that every single I/O
> needs exactly the same treatment, so there have got to be some
> patterns here that do matter. And I can't really tell whether this
> patch set has a theory about that and, if so, what it is.

I didn't want to use this API for relations (file reads/writes for these only
affects a couple of places in md.c), nor for WAL. The intention was to use the
API for temporary files, and also for other kinds of "auxiliary" files where
the current coding is rather verbose but not too special.

To enable the encryption, we'd need to add an extra argument (structure) to
the appropriate function, which provides the necessary information on the
encryption, such as cipher kind (block/stream), the encryption key or the
nonce. I just don't understand whether the TDE alone (4) justifies patch like
this, so I tried to make more use of it.

On the other hand, if code simplification (1) should be the only benefit, the
patch might be too controversial as well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
Bruce Momjian
Date:
On Mon, Aug  8, 2022 at 08:27:27PM +0200, Antonin Houska wrote:
> > For instance, to go back to my earlier comments, if the data is
> > already block-structured, we do not need to insert a second layer of
> > buffering; if it isn't, maybe we should, not just for TDE but for
> > other reasons as well. If the data is temporary, TDE might want to
> > encrypt it using a temporary key which is generated on the fly and
> > only stored in server memory, but if it's data that needs to be
> > readable after a server restart, we're going to need to use a key that
> > we'll still have access to in the future. Or maybe that particular
> > thing isn't relevant, but I just can't believe that every single I/O
> > needs exactly the same treatment, so there have got to be some
> > patterns here that do matter. And I can't really tell whether this
> > patch set has a theory about that and, if so, what it is.
> 
> I didn't want to use this API for relations (file reads/writes for these only
> affects a couple of places in md.c), nor for WAL. The intention was to use the
> API for temporary files, and also for other kinds of "auxiliary" files where
> the current coding is rather verbose but not too special.
> 
> To enable the encryption, we'd need to add an extra argument (structure) to
> the appropriate function, which provides the necessary information on the
> encryption, such as cipher kind (block/stream), the encryption key or the
> nonce. I just don't understand whether the TDE alone (4) justifies patch like
> this, so I tried to make more use of it.

Yes, I assumed a parameter would be added to each call site to indicate
the type of key to be used for encryption, or no encryption.  If these
are in performance-critical places, we can avoid the additional level of
calls by adding an 'if (tde)' check to only do the indirection when
necessary.  We can also skip the indirection for files we know will not
need to be encrypted, e.g., postgresql.conf.

I can also imagine needing to write the data encrypted without
encrypting the in-memory version, so centralizing that code in one place
would be good.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




Re: Temporary file access API

From
Robert Haas
Date:
On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska <ah@cybertec.at> wrote:
> > I don't think that (3) is a separate advantage from (1) and (2), so I
> > don't have anything separate to say about it.
>
> I thought that the uncontrollable buffer size is one of the things you
> complaint about in
>
> https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

Well, I think that email is mostly complaining about there being no
buffering at all in a situation where it would be advantageous to do
some buffering. But that particular code I believe is gone now because
of the shared-memory stats collector, and when looking through the
patch set, I didn't see that any of the cases that it touched were
similar to that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
John Morris
Date:

I’m a latecomer to the discussion, but as a word of introduction, I’m working with Stephen, and I have started looking over the temp file proposal with the idea of helping it move along.

 

I’ll start by summarizing the temp file proposal and its goals.

 

From a high level, the proposed code:

  • Creates an fread/fwrite replacement (BufFileStream) for buffering data to a single file.
  • Updates BufFile, which reads/writes a set of files, to use BufFileStream internally.
  • Does not impact the normal (page cached) database I/O.
  • Updates all the other places where fread/fwrite and read/write are used.
  • Creates and removes transient files.

I see the following goals:

  • Unify all the “other” file accesses into a single, consistent API.
  • Integrate with VFDs.
  • Integrate transient files with transactions and tablespaces.
  • Create a consolidated framework where features like encryption and compression can be more easily added.
  • Maintain good streaming performance.

Is this a fair description of the proposal?

 

For myself, I’d like to map out how features like compression and encryption would fit into the framework, more as a sanity check than anything else, and I’d like to look closer at some of the implementation details. But at the moment, I want to make sure I have the proper high-level view of the temp file proposal.

 

 

From: Robert Haas <robertmhaas@gmail.com>
Date: Wednesday, September 21, 2022 at 11:54 AM
To: Antonin Houska <ah@cybertec.at>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>, Stephen Frost <sfrost@snowman.net>
Subject: Re: Temporary file access API

On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska <ah@cybertec.at> wrote:
> > I don't think that (3) is a separate advantage from (1) and (2), so I
> > don't have anything separate to say about it.
>
> I thought that the uncontrollable buffer size is one of the things you
> complaint about in
>
> https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

Well, I think that email is mostly complaining about there being no
buffering at all in a situation where it would be advantageous to do
some buffering. But that particular code I believe is gone now because
of the shared-memory stats collector, and when looking through the
patch set, I didn't see that any of the cases that it touched were
similar to that one.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Temporary file access API

From
Antonin Houska
Date:
Hi,

John Morris <john@precision-gps.com> wrote:

> I’m a latecomer to the discussion, but as a word of introduction, I’m working with Stephen, and I have started
lookingover the temp file proposal with the idea of helping it move along. 
>
> I’ll start by summarizing the temp file proposal and its goals.
>
> From a high level, the proposed code:
>
> * Creates an fread/fwrite replacement (BufFileStream) for buffering data to a single file.
>
> * Updates BufFile, which reads/writes a set of files, to use BufFileStream internally.
>
> * Does not impact the normal (page cached) database I/O.
>
> * Updates all the other places where fread/fwrite and read/write are used.

Not really all, just those where the change seemed reasonable (i.e. where it
does not make the code more complex)

> * Creates and removes transient files.

The "stream API" is rather an additional layer on top of files that user needs
to create / remove at lower level.

> I see the following goals:
>
> * Unify all the “other” file accesses into a single, consistent API.
>
> * Integrate with VFDs.
>
> * Integrate transient files with transactions and tablespaces.

If you mean automatic closing/deletion of files on transaction end, this is
also the lower level thing that I didn't try to change.

> * Create a consolidated framework where features like encryption and compression can be more easily added.
>
> * Maintain good streaming performance.
>
> Is this a fair description of the proposal?

Basically that's it.

> For myself, I’d like to map out how features like compression and encryption would fit into the framework, more as a
sanitycheck than anything else, and I’d like to look closer at some of the implementation details. But at the moment, I
wantto make sure I have the 
> proper high-level view of the temp file proposal.

I think the high level design (i.e. how the API should be used) still needs
discussion. In particular, I don't know whether it should aim at the
encryption adoption or not. If it does, then it makes sense to base it on
buffile.c, because encryption essentially takes place in memory. But if
buffering itself (w/o encryption) is not really useful at other places (see
Robert's comments in [1]), then we can design something simpler, w/o touching
buffile.c (which, in turn, won't be usable for encryption, compression or so).

So I think that code simplification and easy adoption of in-memory data
changes (such as encryption or compression) are two rather distinct goals.
admit that I'm running out of ideas how to develop a framework that'd be
useful for both.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZWP8UtkNVLd75Qqoh9VGOVy_0xBK%2BSHZAdNvnfaikKsQ%40mail.gmail.com


> From: Robert Haas <robertmhaas@gmail.com>
> Date: Wednesday, September 21, 2022 at 11:54 AM
> To: Antonin Houska <ah@cybertec.at>
> Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>,
StephenFrost <sfrost@snowman.net> 
> Subject: Re: Temporary file access API
>
> On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska <ah@cybertec.at> wrote:
> > > I don't think that (3) is a separate advantage from (1) and (2), so I
> > > don't have anything separate to say about it.
> >
> > I thought that the uncontrollable buffer size is one of the things you
> > complaint about in
> >
> > https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com
>
> Well, I think that email is mostly complaining about there being no
> buffering at all in a situation where it would be advantageous to do
> some buffering. But that particular code I believe is gone now because
> of the shared-memory stats collector, and when looking through the
> patch set, I didn't see that any of the cases that it touched were
> similar to that one.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Temporary file access API

From
John Morris
Date:
  • So I think that code simplification and easy adoption of in-memory data
    changes (such as encryption or compression) are two rather distinct goals.
    admit that I'm running out of ideas how to develop a framework that'd be
    useful for both.

 

I’m also wondering about code simplification vs a more general encryption/compression framework. I started with the current proposal, and I’m also looking at David Steele’s approach to encryption/compression in pgbackrest. I’m beginning to think we should do a high-level design which includes encryption/compression, and then implement it as a “simplification” without actually doing the transformations.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Re: Temporary file access API

From
Michael Paquier
Date:
On Tue, Sep 27, 2022 at 04:22:39PM +0000, John Morris wrote:
> I’m also wondering about code simplification vs a more general
> encryption/compression framework. I started with the current
> proposal, and I’m also looking at David Steele’s approach to
> encryption/compression in pgbackrest. I’m beginning to think we
> should do a high-level design which includes encryption/compression,
> and then implement it as a “simplification” without actually doing
> the transformations.

It sounds to me like the proposal of this thread is not the most
adapted thing, so I have marked it as RwF as this is the status of 2~3
weeks ago..  If you feel that I am wrong, feel free to scream after
me.
--
Michael

Attachment