Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Use streaming read API in ANALYZE
Date
Msg-id CA+hUKGJb6EgQDN5dPyvuT7QKrDqNyPgFojXU8F14+8wMp16imA@mail.gmail.com
Whole thread Raw
In response to Re: Use streaming read API in ANALYZE  (Mats Kindahl <mats@timescale.com>)
Responses Re: Use streaming read API in ANALYZE
Re: Use streaming read API in ANALYZE
List pgsql-hackers
On Thu, Sep 5, 2024 at 6:45 PM Mats Kindahl <mats@timescale.com> wrote:
> Forgive me for asking, but I am not entirely sure why the ReadStream struct is opaque. The usual reasons are:
>
> You want to provide an ABI to allow extensions to work with new major versions without re-compiling. Right now it is
necessaryto recompile extensions anyway, this does not seem to apply. (Because there are a lot of other changes that
youneed when switching versions because of the lack of a stable ABI for other parts of the code. However, it might be
thatthe goal is to support it eventually, and then it would make sense to start making structs opaque.) 
> You want to ensure that you can make modifications inside a major version without breaking ABIs and requiring a
re-compile.In this case, you could still follow safe practice of adding new fields last, not relying on the size of the
structfor anything (e.g., no arrays of these structures, just pointers to them), etc. However, if you want to be very
safeand support very drastic changes inside a major version, it needs to be opaque, so this could be the reason. 
>
> Is it either of these reasons, or is there another reason?
>
> Making the ReadStream API non-opaque (that is, moving the definition to the header file) would at least solve our
problem(unless I am mistaken). However, I am ignorant about long-term plans which might affect this, so there might be
agood reason to revert it for reasons I am not aware of. 

The second thing.  Also there are very active plans[1] to change the
internal design of ReadStream in 18, since the goal is to drive true
asynchronous I/O, and the idea of ReadStream was to create a simple
API to let many consumers start using it, so that we can drive
efficient modern system interfaces below that API, so having people
depending on how it works would not be great.

But let's talk about how that would actually look, for example if we
exposed the struct or you took a photocopy of it...  I think your idea
must be something like: if you could access struct ReadStream's
internals, you could replace stream->callback with an interceptor
callback, and if the BlockSampler had been given the fake N + M
relation size, the interceptor could overwrite
stream->ios[next_io_index].op.smgr and return x - N if the intercepted
callback returned x >= N.  (Small detail: need to check
stream->fast_path and use 0 instead or something like that, but maybe
we could change that.)  One minor problem that jumps out is that
read_stream.c could inappropriately merge blocks from the two
relations into one I/O.  Hmm, I guess you'd have to teach the
interceptor not to allow that: if switching between the two relation,
and if the block number would coincide with
stream->pending_read_blocknum + stream->pending_read_nblocks, it would
need to pick a new block instead (interfering with the block sampling
algorithm, but only very rarely).  Is this what you had in mind, or
something else?

(BTW I have a patch to teach read_stream.c about multi-smgr-relation
streams, by adding a different constructor with a different callback
that returns smgr, fork, block instead of just the block, but it
didn't make it into 17.)

[1] https://www.postgresql.org/message-id/flat/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: meson vs. llvm bitcode files
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: meson vs. llvm bitcode files