Re: Pluggable toaster - Mailing list pgsql-hackers

From Nikita Malakhov
Subject Re: Pluggable toaster
Date
Msg-id CAN-LCVPv-aTC5iH4=nx_q4rymkLt5RXWM=qP9smczmFiwMmwJw@mail.gmail.com
Whole thread Raw
In response to Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
List pgsql-hackers
Hi!

>From personal experience with the project I have serious doubts this
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).

Actually, there's a documentation part in the patchset. Also, there is README file
explaining the API.
In addition, we have several custom TOAST implementations with some
results - they were published and presented on PgCon. I was asked to exclude
custom TOAST implementations and some further improvements for the first
iteration, that's why currently the patchset consists only of 3 patches - base
core changes, default TOAST implementation via TOAST API and documentation
package.

>What other use cases for TOAST do you have in mind?

The main use case is the same as for the TOAST mechanism - storing and retrieving
oversized data. But we expanded this case with some details - 
- update TOASTed data (yes, current TOAST implementation cannot update stored
data - is marks whole TOASTED object as dead and stores new one);
- retrieve part of the stored data chunks without fully de-TOASTing stored data (even
with existing TOAST this will be painful if you have to get just a small part of the several
 hundreds Mb sized object);
- be able to store objects of size larger than 1 Gb;
- store more than 4 Tb of TOASTed data for one table;
- optimize storage for fast search and retrieval of parts of TOASTed object - this is
must-have for effectively using JSON, PostgreSQL already is in catching-up position
in JSON performance field.

For all this cases we have test results that show improvements in storage and performance.

>To clarify, the concern about "N TOASTers vs M TableAM" was expressed
>by Robert Haas back in Jan 2022:

>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.

>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").

>This what I mean by the community feedback being discarded.

Maybe there was some misunderstanding, I was new to this project and
company at that time - I was introduced to is in the middle of December
2021, but  Theodor Sigaev gave an answer to Mr. Haas:

>Right. that's why we propose a validate method  (may be, it's a wrong
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.

And this is generalized and correct from the OOP POV mean to provide a
way to ensure that this concrete TOAST implementation is valid for Table AM
calling it.


On Mon, Oct 24, 2022 at 4:53 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi Nikita,

> Using Table AM Routine and routing AM methods calls via it is a topic for further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open issue.

From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).

> TOAST implementation is not necessary for Table AM.

What other use cases for TOAST do you have in mind?

>> > Have I answered your question? Please don't hesitate to point to any unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.

To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").

This what I mean by the community feedback being discarded.

--
Best regards,
Aleksander Alekseev


--
Regards,
Nikita Malakhov
Postgres Professional 

pgsql-hackers by date:

Previous
From: "Finnerty, Jim"
Date:
Subject: Re: parse partition strategy string in gram.y
Next
From: Peter Geoghegan
Date:
Subject: Re: effective_multixact_freeze_max_age issue