Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Date
Msg-id CAMT0RQQjZ+tWyA60mcdTHqG2x-FA2722i4YcLbvoSEeK3axO8w@mail.gmail.com
Whole thread Raw
In response to Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)  (Nikita Malakhov <hukutoc@gmail.com>)
Responses Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
List pgsql-hackers
I still think we should go with direct toast tid pointers in varlena
and not some kind of oid.

It will remove the need for any oid management and also will be
many-many orders of magnitude faster for large tables (just 2x faster
for in-memory small tables)

I plan to go over Michael's patch set here and see how much change is
needed to add the "direct toast"

My goals are:
1. fast lookup from skipping index lookup
2. making the toast pointer in main heap as small as possible -
hopefully just the 6 bytes of tid pointer - so that scans that do not
need toasted values get more tuples from each page
3. adding all (optional) the extra data into toast chunk record as
there we are free to add whatever is needed
Currently I plan to introduces something like this for toast chunk record

Column | Type | Storage
-------------+---------+----------
chunk_id | oid | plain | 0 when not using toast index, 0xfffe -
non-deletable, for example when used as dictionary for multiple
toasted values.
chunk_seq | integer | plain | if not 0 when referenced from toast
pointer then the toasted data starts at toast_pages[0] (or below it in
that tree), which *must* have chunk_id = 0
chunk_data | bytea | plain

-- added fields

toast_pages | tid[] | plain | can be chained or make up a tree
offsets | int[] | plain | -- starting offsets of the toast_pages
(octets or type-specific units), upper bit is used to indicate that a
new compressed span starts at that offset, 2nd highest bit indicates
that the page is another tree page
comp_method | int | plain | -- compression methos used maybe should be enum ?
dict_pages | tid[] | plain | -- pages to use as compression
dictionary, up to N pages, one level

This seems to be flexible enough to allow for both compressin and
efficient partial updates

---
Hannu


On Tue, Jul 8, 2025 at 8:31 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi!
>
> Greg, thanks for the interest in our work!
>
> Michael, one more thing forgot to mention yesterday -
> #define TOAST_EXTERNAL_INFO_SIZE (VARTAG_ONDISK_OID + 1)
> static const toast_external_info toast_external_infos[TOAST_EXTERNAL_INFO_SIZE]
> VARTAG_ONDISK_OID historically has a value of 18
> and here we got an array of 19 members with only 2 valid ones.
>
> What do you think about having an individual
> TOAST value id counter per relation instead of using
> a common one? I think this is a very promising approach,
> but a decision must be made where it should be stored.
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> The Russian Postgres Company
> https://postgrespro.ru/



pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Tab completion for large objects
Next
From: "Zhou, Zhiguo"
Date:
Subject: [RFC] Enhance scalability of TPCC performance on HCC (high-core-count) systems