Re: Pluggable toaster - Mailing list pgsql-hackers

From Nikita Malakhov
Subject Re: Pluggable toaster
Date
Msg-id CAN-LCVNaQy04RvbgVtwygmvfPDFSGxLhzUK=DgUjAEiZ9n9Mfw@mail.gmail.com
Whole thread Raw
In response to Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
List pgsql-hackers
Hi!

Aleksander,
>Don't you think that this is an arguable design decision? Basically
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)

>I believe the proposed approach is architecturally broken from the beginning.

Existing TOAST mechanics just works, but for certain types of data it does so
very poorly, and, let's face it, this mechanics has very strict limitations that limit
overall capabilities of DBMS, because TOAST was designed when today's
usual amounts of data were not the case - I mean tables with hundreds of
billions of rows, with sizes measured by hundreds of Gb and even by Terabytes.

But TOAST itself is good solution to problem of storing oversized attributes, and
though it has some limitations - it is unwise to just throw it away, better way is to
make it up-to-date by revising it, get rid of the most painful limitations and allow
to use different (custom) TOAST strategies for special cases.

The main idea of Pluggable TOAST is to extend TOAST capabilities by providing
common API allowing to uniformly use different strategies to TOAST different data.
With the acronym "TOAST" I mean that data would be stored externally to source
table, somewhere only its Toaster know where and how - it may be regular Heap
tables, Heap tables with different table structure, some other AM tables, files outside
of the database, even files on different storage systems. Pluggable TOAST allows
using advanced compression methods and complex operations on externally stored
data, like search without fully de-TOASTing data, etc.

Also, existing TOAST is a part of Heap AM and is restricted to use Heap only.
To make it extensible - we have to separate TOAST from Heap AM. Default TOAST
in Pluggable TOAST still uses Heap, but Heap knows nothing about TOAST. It fits
perfectly in OOP paradigms

>It looks like the idea should be actually turned inside out. I.e. what
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.

To implement different TOAST strategies you must have an API to plug them in,
otherwise for each strategy you'd have to change the core. TOAST API allows to plug
in custom TOAST strategies just by adding contrib modules, once the API is merged
into the core. I have to make a point that different TOAST strategies do not have
to store data with other TAMs, they just could store these data in Heap but using
knowledge of internal data structure of workflow to store them in a more optimal
way - like fast and partially compressed and decompressed JSON, lots of large
chunks of binary data stored in the database (as you know, largeobjects are not
of much help with this) and so on.

Implementing another Table AM just to implement another TOAST strategy seems too
much, the TAM API is very heavy and complex, and you would have to add it as a contrib.
Lots of different TAMs would cause much more problems than lots of Toasters because
such a solution results in data incompatibility between installations with different TAMs
and some minor changes in custom TAM contrib could lead to losing all data stored with
this TAM, but with custom TOAST you (in the worst case) could lose just TOASTed data
 and nothing else.

We have lots of requests from clients and tickets related to TOAST limitations and
extending Postgres this way - this growing need made us develop Pluggable TOAST.



On Sun, Oct 23, 2022 at 12:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi Nikita,

> Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is
> free to use any storage methods

Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)

I believe the proposed approach is architecturally broken from the beginning.

It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.

Does it make sense?

Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.

[1]: https://www.postgresql.org/docs/current/tableam.html

--
Best regards,
Aleksander Alekseev


--
Regards,
Nikita Malakhov
Postgres Professional 

pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: Interesting areas for beginners
Next
From: Maciek Sakrejda
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)