Thread: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
Aleksander Alekseev
Date:
Hi hackers,

Recently in one discussion a user complained [1] about
counterintuitive behavior of toast_tuple_target. Here is a quote:

"""
Table size 177.74 GB
Toast table size 12 GB
Indexes size 33.49 GB

This table is composed of small columns "id", "hash", "size", and a
mid~big (2~512kb) jsonb.

I don't want to be forced to read the big column when doing seq scans,
so I tried to set toast_tuple_target = 128, to exclude the big column,
but even after a VACUUM FULL i couldn't get pg to toast the big
column. Am I doing something wrong?
"""

Arguably in this case the user may actually want to store the JSONB
fields by the foreign key.

However the user may have a good point that setting toast_tuple_target
< TOAST_TUPLE_THRESHOLD effectively does nothing. This happens because
[2]:

"""
The TOAST management code is triggered only when a row value to be
stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally
2 kB). The TOAST code will compress and/or move field values
out-of-line until the row value is shorter than toast_tuple_target
bytes (also normally 2 kB, adjustable) or no more gains can be had.
"""

... TOAST is _triggered_ by TOAST_TUPLE_THRESHOLD but tries to
compress the tuple until toast_tuple_target bytes. This is indeed
somewhat confusing.

I see several ways of solving this.

1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD
2. Consider using something like RelationGetToastTupleTarget(rel,
TOAST_TUPLE_THRESHOLD) in heapam.c:2250, heapam.c:3625 and
rewriteheap.c:636 and modify the documentation accordingly.
3. Add a separate user-defined table setting toast_tuple_threshold
similar to toast_tuple_target.

Thoughts?

[1]: https://t.me/pg_sql/62265
[2]: https://www.postgresql.org/docs/current/storage-toast.html

-- 
Best regards,
Aleksander Alekseev



Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
Aleksander Alekseev
Date:
Hi hackers,

> 1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD

Reading my own email I realized that this of course was stupid. For
sure this is not an option. It's getting late in my timezone, sorry :)

-- 
Best regards,
Aleksander Alekseev



Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
Nikita Malakhov
Date:
Hi!

I've noticed this behavior half a year ago during experiments with TOAST, and 
TOAST_TUPLE_THRESHOLD really works NOT the way it is thought to.
I propose something like FORCE_TOAST flag/option as column option (stored
in attoptions), because we already encountered multiple cases where data 
should be stored externally despite its size.
Currently I'm working on passing Toaster options in attoptions.

Thoughts?

On Wed, Sep 14, 2022 at 7:12 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi hackers,

> 1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD

Reading my own email I realized that this of course was stupid. For
sure this is not an option. It's getting late in my timezone, sorry :)

--
Best regards,
Aleksander Alekseev




--
Regards,
Nikita Malakhov
Postgres Professional 

Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
David Rowley
Date:
On Thu, 15 Sept 2022 at 04:04, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> 1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD
> 2. Consider using something like RelationGetToastTupleTarget(rel,
> TOAST_TUPLE_THRESHOLD) in heapam.c:2250, heapam.c:3625 and
> rewriteheap.c:636 and modify the documentation accordingly.
> 3. Add a separate user-defined table setting toast_tuple_threshold
> similar to toast_tuple_target.
>
> Thoughts?

There was some discussion on this problem in [1].

The problem with #2 is that if you look at
heapam_relation_needs_toast_table(), it only decides if the toast
table should be created based on (tuple_length >
TOAST_TUPLE_THRESHOLD). So if you were to change the logic as you
describe for #2 then there might not be a toast table during an
INSERT/UPDATE.

The only way to fix that would be to ensure that we reconsider if we
should create a toast table or not when someone changes the
toast_tuple_target reloption.  That can't be done under
ShareUpdateExclusiveLock, so we'd need to obtain an
AccessExclusiveLock instead when changing the toast_tuple_target
reloption. That might upset some people.

The general direction of [1] was to just increase the minimum setting
to TOAST_TUPLE_THRESHOLD, but there were some concerns about breaking
pg_dump as we'd have to error if someone does ALTER TABLE to set the
toast_tuple_target reloption lower than the newly defined minimum
value.

I don't quite follow you on #3. If there's no toast table we can't toast.

David

[1] https://www.postgresql.org/message-id/20190403063759.GF3298@paquier.xyz



Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
Nikita Malakhov
Date:
Hi!

As it is seen from the code (toasting.c and further) Toast tables are created immediately
when a new relation with the TOASTable column is created. Practically, there could occur
the case when Toast table does not exist and we should of course check for that.

TOAST_TUPLE_THRESHOLD is not only one which decides should be value stored
externally, this is slightly more complex and less obvious logic:
(see heapam.c, heap_prepare_insert())
else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)

as you can see here is another condition - HeapTupleHasExternal, which is set in
heap_fill_tuple and lower in fill_val, where the infomask bit HEAP_HASEXTERNAL is set.

So when I experimented with the TOAST I'd to add a new flag which forced the value to be
TOASTed regardless of its size.

Also, TOAST_TUPLE_THRESHOLD sets overall tuple size over which it would be considered
to be toasted, and has its minimum value that could not be decreased further.

In [1] (the Pluggable TOAST) we suggest making this an ontion for Toaster.



On Thu, Sep 15, 2022 at 3:05 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 15 Sept 2022 at 04:04, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> 1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD
> 2. Consider using something like RelationGetToastTupleTarget(rel,
> TOAST_TUPLE_THRESHOLD) in heapam.c:2250, heapam.c:3625 and
> rewriteheap.c:636 and modify the documentation accordingly.
> 3. Add a separate user-defined table setting toast_tuple_threshold
> similar to toast_tuple_target.
>
> Thoughts?

There was some discussion on this problem in [1].

The problem with #2 is that if you look at
heapam_relation_needs_toast_table(), it only decides if the toast
table should be created based on (tuple_length >
TOAST_TUPLE_THRESHOLD). So if you were to change the logic as you
describe for #2 then there might not be a toast table during an
INSERT/UPDATE.

The only way to fix that would be to ensure that we reconsider if we
should create a toast table or not when someone changes the
toast_tuple_target reloption.  That can't be done under
ShareUpdateExclusiveLock, so we'd need to obtain an
AccessExclusiveLock instead when changing the toast_tuple_target
reloption. That might upset some people.

The general direction of [1] was to just increase the minimum setting
to TOAST_TUPLE_THRESHOLD, but there were some concerns about breaking
pg_dump as we'd have to error if someone does ALTER TABLE to set the
toast_tuple_target reloption lower than the newly defined minimum
value.

I don't quite follow you on #3. If there's no toast table we can't toast.

David

[1] https://www.postgresql.org/message-id/20190403063759.GF3298@paquier.xyz




--
Regards,
Nikita Malakhov
Postgres Professional 

Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

From
Aleksander Alekseev
Date:
Hi David,

> There was some discussion on this problem in [1].
> [1] https://www.postgresql.org/message-id/20190403063759.GF3298@paquier.xyz

Thanks for sharing this discussion. I missed it.

> The problem with #2 is that if you look at
> heapam_relation_needs_toast_table(), it only decides if the toast
> table should be created based on (tuple_length >
> TOAST_TUPLE_THRESHOLD). So if you were to change the logic as you
> describe for #2 then there might not be a toast table during an
> INSERT/UPDATE.
>
> The only way to fix that would be to ensure that we reconsider if we
> should create a toast table or not when someone changes the
> toast_tuple_target reloption.  That can't be done under
> ShareUpdateExclusiveLock, so we'd need to obtain an
> AccessExclusiveLock instead when changing the toast_tuple_target
> reloption. That might upset some people.

Personally, if I would choose between (A) a feature that is
potentially expensive but useful to many and (B) a feature that in
practice is pretty much useless to most of the users, I would choose
(A). Maybe we will be able to make it a little less expensive if we
optimistically take a shared lock first and then, if necessary, take
an exclusive lock.

> The general direction of [1] was to just increase the minimum setting
> to TOAST_TUPLE_THRESHOLD, but there were some concerns about breaking
> pg_dump as we'd have to error if someone does ALTER TABLE to set the
> toast_tuple_target reloption lower than the newly defined minimum
> value.

Yep, this doesn't seem to be an option.

> I don't quite follow you on #3. If there's no toast table we can't toast.

The case I had in mind was the one when we already have a TOAST table
and then change toast_tuple_target.

In this scenario TOAST_TUPLE_THRESHOLD is used to decide whether TOAST
should be triggered for a given tuple. For how long TOAST will keep
compressing the tuple is controlled by toast_tuple_target, not by
TOAST_TUPLE_THRESHOLD. So the user has control of "target" but there
is no control of "threshold". If the user could set both "threshold"
and "target" low this would solve the problem the user originally had
(the one described in the first email).

Adding a separate "threshold" option doesn't solve the problem when we
change it and there is no TOAST table yet.

I wonder though if we really need two entities - a "threshold" and a
"target". It seems to me that it should actually be one value, no?

-- 
Best regards,
Aleksander Alekseev