Home > mailing lists

Re: Pluggable toaster - Mailing list pgsql-hackers

From	Nikita Malakhov
Subject	Re: Pluggable toaster
Date	October 24, 2022 14:44:35
Msg-id	CAN-LCVPv-aTC5iH4=nx_q4rymkLt5RXWM=qP9smczmFiwMmwJw@mail.gmail.com Whole thread
In response to	Re: Pluggable toaster (Aleksander Alekseev <aleksander@timescale.com>)
Responses	Re: Pluggable toaster
List	pgsql-hackers

Tree view

Hi!

>From personal experience with the project I have serious doubts this
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).

Actually, there's a documentation part in the patchset. Also, there is README file

explaining the API.

In addition, we have several custom TOAST implementations with some

results - they were published and presented on PgCon. I was asked to exclude

custom TOAST implementations and some further improvements for the first

iteration, that's why currently the patchset consists only of 3 patches - base

core changes, default TOAST implementation via TOAST API and documentation

package.

>What other use cases for TOAST do you have in mind?

The main use case is the same as for the TOAST mechanism - storing and retrieving

oversized data. But we expanded this case with some details -

- update TOASTed data (yes, current TOAST implementation cannot update stored

data - is marks whole TOASTED object as dead and stores new one);

- retrieve part of the stored data chunks without fully de-TOASTing stored data (even

with existing TOAST this will be painful if you have to get just a small part of the several

hundreds Mb sized object);

- be able to store objects of size larger than 1 Gb;

- store more than 4 Tb of TOASTed data for one table;

- optimize storage for fast search and retrieval of parts of TOASTed object - this is

must-have for effectively using JSON, PostgreSQL already is in catching-up position

in JSON performance field.

For all this cases we have test results that show improvements in storage and performance.

>To clarify, the concern about "N TOASTers vs M TableAM" was expressed
>by Robert Haas back in Jan 2022:

>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.

>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").

>This what I mean by the community feedback being discarded.

Maybe there was some misunderstanding, I was new to this project and

company at that time - I was introduced to is in the middle of December

2021, but Theodor Sigaev gave an answer to Mr. Haas:

>Right. that's why we propose a validate method (may be, it's a wrong
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.

And this is generalized and correct from the OOP POV mean to provide a

way to ensure that this concrete TOAST implementation is valid for Table AM

calling it.

On Mon, Oct 24, 2022 at 4:53 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Using Table AM Routine and routing AM methods calls via it is a topic for further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open issue.

From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).

> TOAST implementation is not necessary for Table AM.

What other use cases for TOAST do you have in mind?

>> > Have I answered your question? Please don't hesitate to point to any unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.

To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").

This what I mean by the community feedback being discarded.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

pgsql-hackers by date:

From: "Finnerty, Jim"
Date: 24 October 2022, 14:32:02
Subject: Re: parse partition strategy string in gram.y

From: Peter Geoghegan
Date: 24 October 2022, 14:56:24
Subject: Re: effective_multixact_freeze_max_age issue

Re: Pluggable toaster - Mailing list pgsql-hackers

Previous

Next