Re: Pluggable toaster - Mailing list pgsql-hackers

From Nikita Malakhov
Subject Re: Pluggable toaster
Date
Msg-id CAN-LCVN++kdZ5Z2CmuTzjNOGmHyjM-R24oTKrogHn_a=_h-tKA@mail.gmail.com
Whole thread Raw
In response to Re: Pluggable toaster  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Pluggable toaster
List pgsql-hackers
Hi,

Setting TOAST for table and database is a subject for discussion. There is already default
Toaster. Also, there is not much sense in setting Jsonb Toaster as default even for table, do
not say database, because table could contain other TOASTable columns not of Json type.

To be able to set custom Toaster as default for table you have to make it work with ALL 
TOASTable datatypes - which leads to lots and lots lines of code, complexity and difficulties
supporting such custom Toaster. Custom Toasters are meant to be rather small and have
specialty in some tricky datatypes or workflow.

Custom Toasters will work with Extended storage, but as I answered in previous email -
there is no much use of it, because they would deal with compressed data.

>No, encryption is an excellent example of what a TOASTer should NOT
>do. If you are interested in encryption consider joining the "Moving
>forward with TDE" thread [2].

I'm not working with encryption, so maybe it is really out of scope example. Anyway,
compression and dealing with data with known internal structure or some special
requirements lile geometric data in PostGIS - for example, custom PostGIS Toaster gives
considerable performance boost.

>But should we really distinguish INSERT and UPDATE cases on this API
>level? It seems to me that TableAM just inserts new tuples. It's
>TOASTers job to figure out whether similar values existed before and
>should or shouldn't be reused. Additionally a particular TOASTer can
>reuse old values between _different_ rows, potentially even from
>different tables. Another reason why in practice there is little use
>of knowing whether the data is INSERTed or UPDATEd.

For TOASTer you SHOULD distinguish insert and update operations, really. Because for
TOASTed data these operations affect many tuples, and AM does know which of them 
were updated and which were not - that's very serious limitation of current TOAST, and
TOAST mechanics shoud care about updating affected tuples only instead of marking
whole record dead and inserting new one. This is also an argument for not using EXTENDED
storage mode - because with compressed data you do not have such choice, you should
drop the whole record.

Correctly implemented UPDATE for TOAST boosts performance and considerably
decreases size of TOAST tables along with WAL size. This is not a question, an UPDATE
operation for TOASTed data is a must - consider updating 1 Gb TOASTed record - with
current TOAST you would finish having 2 1 Gb records in a table, one of them dead, and
2 Gb in WAL. With update you would have the same 1 Gb record and only update diff in WAL.

>Users should be able to DROP extension. I seriously doubt that the
>patch is going to be accepted as long as it has this limitation.

There is a mention in documentation and previous discussion that this operation would lead
to loss of data TOASTed with this custom Toaster. It was stated as an issue and subject for
further duscucssion in previous emails.

--
Regards,
Nikita Malakhov
Postgres Professional 

pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Moving forward with TDE
Next
From: Tom Lane
Date:
Subject: Re: Add explicit casts in four places to simplehash.h