chunk_
On Tue, Jul 8, 2025 at 9:37 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-08, Hannu Krosing wrote:
>
> > I still think we should go with direct toast tid pointers in varlena
> > and not some kind of oid.
>
> I think this can be made to work, as long as we stop seeing the toast
> table just like a normal heap table containing normal tuples. A lot to
> reimplement though -- vacuum in particular.
Non-FULL vacuum should already work. Only commands like VACUUM FULL
and CLUSTER which move tuples around should be disabled on TOAST
tables.
What other parts do you think need re-implementing in addition to
skipping the index lookup part and using the tid directly ?
The fact that per-page chunk_tid arrays allow also tree structures
should allow us much more flexibility in implementing
in-place-updatable structured storage in something otherways very
similar to toast, but this is not required for just moving from oid +
index ==> tid to using the tid directly.
I think that having a toast table as a normal table with full MVCC is
actually a good thing, as it can implement the "array element update"
as a real partial update of only the affected parts and not the
current 'copy everything' way of doing this. We already do collect the
array element update in the parse tree in a special way, now we just
need to have types that can do the partial update by changing a tid or
two in the chunk_tids array (and adjust the offsets array if needed)
This should make both
UPDATE t SET theintarray[3] = 5, theintarray[4] = 7 WHERE id = 1;
and even do partial up[dates for something like this
hannuk=# select * from jtab;
id | j
----+----------------------------
1 | {"a": 3, "b": 2}
2 | {"c": 1, "d": [10, 20, 3]}
(2 rows)
hannuk=# update jtab SET j['d'][3] = '7' WHERE id = 2;
UPDATE 1
hannuk=# select * from jtab;
id | j
----+-------------------------------
1 | {"a": 3, "b": 2}
2 | {"c": 1, "d": [10, 20, 3, 7]}
(2 rows)
when the JSON data is so large that changed part is in it's own chunk.
> Maybe it can be thought of
> as a new table AM. Not an easy project, I reckon.
I would prefer it to be an extension of current toast - just another
varatt_* type - as then you can upgrade to new storage CONCURRENTLY,
same way as you can currently switch compression methods.
> --
> Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/