Re: Temporary file access API - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Temporary file access API
Date
Msg-id 17018.1663940758@antos
Whole thread Raw
In response to Re: Temporary file access API  (John Morris <john@precision-gps.com>)
Responses Re: Temporary file access API
List pgsql-hackers
Hi,

John Morris <john@precision-gps.com> wrote:

> I’m a latecomer to the discussion, but as a word of introduction, I’m working with Stephen, and I have started
lookingover the temp file proposal with the idea of helping it move along. 
>
> I’ll start by summarizing the temp file proposal and its goals.
>
> From a high level, the proposed code:
>
> * Creates an fread/fwrite replacement (BufFileStream) for buffering data to a single file.
>
> * Updates BufFile, which reads/writes a set of files, to use BufFileStream internally.
>
> * Does not impact the normal (page cached) database I/O.
>
> * Updates all the other places where fread/fwrite and read/write are used.

Not really all, just those where the change seemed reasonable (i.e. where it
does not make the code more complex)

> * Creates and removes transient files.

The "stream API" is rather an additional layer on top of files that user needs
to create / remove at lower level.

> I see the following goals:
>
> * Unify all the “other” file accesses into a single, consistent API.
>
> * Integrate with VFDs.
>
> * Integrate transient files with transactions and tablespaces.

If you mean automatic closing/deletion of files on transaction end, this is
also the lower level thing that I didn't try to change.

> * Create a consolidated framework where features like encryption and compression can be more easily added.
>
> * Maintain good streaming performance.
>
> Is this a fair description of the proposal?

Basically that's it.

> For myself, I’d like to map out how features like compression and encryption would fit into the framework, more as a
sanitycheck than anything else, and I’d like to look closer at some of the implementation details. But at the moment, I
wantto make sure I have the 
> proper high-level view of the temp file proposal.

I think the high level design (i.e. how the API should be used) still needs
discussion. In particular, I don't know whether it should aim at the
encryption adoption or not. If it does, then it makes sense to base it on
buffile.c, because encryption essentially takes place in memory. But if
buffering itself (w/o encryption) is not really useful at other places (see
Robert's comments in [1]), then we can design something simpler, w/o touching
buffile.c (which, in turn, won't be usable for encryption, compression or so).

So I think that code simplification and easy adoption of in-memory data
changes (such as encryption or compression) are two rather distinct goals.
admit that I'm running out of ideas how to develop a framework that'd be
useful for both.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZWP8UtkNVLd75Qqoh9VGOVy_0xBK%2BSHZAdNvnfaikKsQ%40mail.gmail.com


> From: Robert Haas <robertmhaas@gmail.com>
> Date: Wednesday, September 21, 2022 at 11:54 AM
> To: Antonin Houska <ah@cybertec.at>
> Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>,
StephenFrost <sfrost@snowman.net> 
> Subject: Re: Temporary file access API
>
> On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska <ah@cybertec.at> wrote:
> > > I don't think that (3) is a separate advantage from (1) and (2), so I
> > > don't have anything separate to say about it.
> >
> > I thought that the uncontrollable buffer size is one of the things you
> > complaint about in
> >
> > https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com
>
> Well, I think that email is mostly complaining about there being no
> buffering at all in a situation where it would be advantageous to do
> some buffering. But that particular code I believe is gone now because
> of the shared-memory stats collector, and when looking through the
> patch set, I didn't see that any of the cases that it touched were
> similar to that one.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
Next
From: Tom Lane
Date:
Subject: Re: First draft of the PG 15 release notes