Re: Temporary file access API - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Temporary file access API
Date
Msg-id 3146.1649408068@antos
Whole thread Raw
In response to Re: Temporary file access API  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Temporary file access API
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Mar 8, 2022 at 6:12 AM Antonin Houska <ah@cybertec.at> wrote:
> > Thanks for your comments, the initial version is attached here.
>
> I've been meaning to look at this thread for some time but have not
> found enough time to do that until just now. And now I have
> questions...
>
> 1. Supposing we accepted this, how widely do you think that we could
> adopt it? I see that this patch set adapts a few places to use it and
> that's nice, but I have a feeling there's a lot more places that are
> making use of system calls directly, or through some other
> abstraction, than just this. I'm not sure that we actually want to use
> something like this everywhere, but what would be the rule for
> deciding where to use it and where not to use
> it? If the plan for this facility is just to adapt these two
> particular places to use it, that doesn't feel like enough to be
> worthwhile.

Admittedly I viewed the problem from the perspective of the TDE, so I haven't
spent much time looking for other opportunities. Now, with the stats collector
using shared memory, even one of the use cases implemented here no longer
exists. I need to do more research.

Do you think that the use of a system call is a problem itself (e.g. because
the code looks less simple if read/write is used somewhere and fread/fwrite
elsewhere; of course of read/write is mandatory in special cases like WAL,
heap pages, etc.)  or is the problem that the system calls are used too
frequently? I suppose only the latter.

Anyway, I'm not sure there are *many* places where system calls are used too
frequently. Instead, the coding uses to be such that the information is first
assembled in memory and then written to file at once. So the value of the
(buffered) stream is that it makes the code simpler (eliminates the need to
prepare the data in memory). That's what I tried to do for reorderbuffer.c and
pgstat.c in my patch.

Related question is whether we should try to replace some uses of the libc
stream (FILE *) at some places. You seem to suggest that in [1]. One example
is snapmgr.c:ExportSnapshot(), if we also implement output formatting. Of
course there are places where (FILE *) cannot be replaced because, besides
regular file, the code needs to work with stdin/stdout in general. (Parsing of
configuration files falls into this category, but that doesn't matter because
bison-generated parser seems to implement buffering anyway.)

> 2. What about frontend code? Most frontend code won't examine data
> files directly, but at least pg_controldata, pg_checksums, and
> pg_resetwal are exceptions.

If the frequency of using system calls is the problem, then I wouldn't change
these because ControlFileData structure needs to be initialized in memory
anyway and then written at once. And pg_checksums reads whole blocks
anyway. I'll take a closer look.

> 3. How would we extend this to support encryption? Add an extra
> argument to BufFileStreamInit(V)FD passing down the encryption
> details?

Yes.

> There are some smaller things about the patch with which I'm not 100%
> comfortable, but I'd like to start by understanding the big picture.

Thanks!

[1] https://www.postgresql.org/message-id/CA+TgmoYGjN_f=FCErX49bzjhNG+GoctY+a+XhNRWCVvDY8U74w@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply