Pluggable toaster - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Pluggable toaster
Date
Msg-id 224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
Whole thread Raw
Responses Re: Pluggable toaster
Re: Pluggable toaster
List pgsql-hackers
Hi!

We are working on custom toaster for JSONB [1], because current TOAST is 
universal for any data type and because of that it has some disadvantages:
    - "one toast fits all"  may be not the best solution for particular
      type or/and use cases
    - it doesn't know the internal structure of data type, so it  cannot
      choose an optimal toast strategy
    - it can't  share common parts between different rows and even
      versions of rows

Modification of current toaster for all tasks and cases looks too 
complex, moreover, it  will not works for  custom data types. Postgres 
is an extensible database,  why not to extent its extensibility even 
further, to have pluggable TOAST! We  propose an idea to separate 
toaster from  heap using  toaster API similar to table AM API etc. 
Following patches are applicable over patch in [1]

1) 1_toaster_interface_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_interface
   Introduces  syntax for storage and formal toaster API. Adds column 
atttoaster to pg_attribute, by design this column should not be equal to 
invalid oid for any toastable datatype, ie it must have correct oid for 
any type (not column) with non-plain storage. Since  toaster may support 
only particular datatype, core should check correctness of toaster set 
by toaster validate method. New commands could be found in 
src/test/regress/sql/toaster.sql

On-disk toast pointer structure now has one more possible struct - 
varatt_custom with fixed header and variable tail which uses as a 
storage for custom toasters. Format of built-in toaster is kept to allow 
simple pg_upgrade logic.

Since toaster for column could be changed during table's lifetime we had 
two options about toaster's drop operation:
    - if column's toaster has been changed,  then we need to re-toast all
      values, which could be extremely expensive. In any case,
      functions/operators should be ready to work with values toasted by
      different toasters, although any toaster should execute simple
      toast/detoast operation, which allows any existing code to
      work with the new approach. Tracking dependency of toasters and
      rows looks as bad idea.
    - disallow drop toaster. We don't believe that there will be many
      toasters at the same time (number of AM isn't very high too and
      we don't believe that it will be changed significantly in the near
      future), so prohibition of  dropping  of toaster looks reasonable.
In this patch set we choose second option.

Toaster API includes get_vtable method, which is planned to access the 
custom toaster features which isn't covered by this API.  The idea is, 
that toaster returns some structure with some values and/or pointers to 
toaster's methods and caller could use it for particular purposes, see 
patch 4). Kind of structure identified by magic number, which should be 
a first field in this structure.

Also added contrib/dummy_toaster to simplify checking.

psql/pg_dump are modified to support toaster object concept.

2) 2_toaster_default_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring)  uisng toaster API 
as generic (or default) toaster.  dummy_toaster here is a minimal 
workable example, it saves value directly in toast pointer and fails if 
value is greater than 1kb.

3) 3_toaster_snapshot_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted 
values to share common parts of toasted values between different 
versions of rows

4) 4_bytea_appendable_toaster_v1.patch.gz
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns, 
which allows fast appending to existing bytea value. Appended tail 
stored directly in toaster pointer, if there is enough place to do it.

Note: patch modifies byteacat() to support contrib toaster. Seems, it's 
looks ugly and contrib module should create new concatenation function.

We are open for any questions, discussions, objections and advices.
Thank you.

Peoples behind:
Oleg Bartunov
Nikita Gluhov
Nikita Malakhov
Teodor Sigaev


[1]
https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru 
<https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru>

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Autovacuum and idle_session_timeout
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Strange path from pgarch_readyXlog()