Home > mailing lists

Re: Pluggable toaster - Mailing list pgsql-hackers

From	Nikita Malakhov
Subject	Re: Pluggable toaster
Date	October 23, 2022 23:38:13
Msg-id	CAN-LCVNaQy04RvbgVtwygmvfPDFSGxLhzUK=DgUjAEiZ9n9Mfw@mail.gmail.com Whole thread Raw
In response to	Re: Pluggable toaster (Aleksander Alekseev <aleksander@timescale.com>)
Responses	Re: Pluggable toaster (Aleksander Alekseev <aleksander@timescale.com>)
List	pgsql-hackers

Tree view

Hi!

Aleksander,

>Don't you think that this is an arguable design decision? Basically
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)

>I believe the proposed approach is architecturally broken from the beginning.

Existing TOAST mechanics just works, but for certain types of data it does so

very poorly, and, let's face it, this mechanics has very strict limitations that limit

overall capabilities of DBMS, because TOAST was designed when today's

usual amounts of data were not the case - I mean tables with hundreds of

billions of rows, with sizes measured by hundreds of Gb and even by Terabytes.

But TOAST itself is good solution to problem of storing oversized attributes, and

though it has some limitations - it is unwise to just throw it away, better way is to

make it up-to-date by revising it, get rid of the most painful limitations and allow

to use different (custom) TOAST strategies for special cases.

The main idea of Pluggable TOAST is to extend TOAST capabilities by providing

common API allowing to uniformly use different strategies to TOAST different data.

With the acronym "TOAST" I mean that data would be stored externally to source

table, somewhere only its Toaster know where and how - it may be regular Heap

tables, Heap tables with different table structure, some other AM tables, files outside

of the database, even files on different storage systems. Pluggable TOAST allows

using advanced compression methods and complex operations on externally stored

data, like search without fully de-TOASTing data, etc.

Also, existing TOAST is a part of Heap AM and is restricted to use Heap only.

To make it extensible - we have to separate TOAST from Heap AM. Default TOAST

in Pluggable TOAST still uses Heap, but Heap knows nothing about TOAST. It fits

perfectly in OOP paradigms

>It looks like the idea should be actually turned inside out. I.e. what
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.

To implement different TOAST strategies you must have an API to plug them in,

otherwise for each strategy you'd have to change the core. TOAST API allows to plug

in custom TOAST strategies just by adding contrib modules, once the API is merged

into the core. I have to make a point that different TOAST strategies do not have

to store data with other TAMs, they just could store these data in Heap but using

knowledge of internal data structure of workflow to store them in a more optimal

way - like fast and partially compressed and decompressed JSON, lots of large

chunks of binary data stored in the database (as you know, largeobjects are not

of much help with this) and so on.

Implementing another Table AM just to implement another TOAST strategy seems too

much, the TAM API is very heavy and complex, and you would have to add it as a contrib.

Lots of different TAMs would cause much more problems than lots of Toasters because

such a solution results in data incompatibility between installations with different TAMs

and some minor changes in custom TAM contrib could lead to losing all data stored with

this TAM, but with custom TOAST you (in the worst case) could lose just TOASTed data

and nothing else.

We have lots of requests from clients and tickets related to TOAST limitations and

extending Postgres this way - this growing need made us develop Pluggable TOAST.

On Sun, Oct 23, 2022 at 12:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is
> free to use any storage methods

Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)

I believe the proposed approach is architecturally broken from the beginning.

It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.

Does it make sense?

Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.

[1]: https://www.postgresql.org/docs/current/tableam.html

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

pgsql-hackers by date:

From: Robert Treat
Date: 23 October 2022, 23:28:13
Subject: Re: Interesting areas for beginners

From: Maciek Sakrejda
Date: 24 October 2022, 01:35:38
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Re: Pluggable toaster - Mailing list pgsql-hackers

Previous

Next