Thread: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Dominique Devienne
Date:
Hi. If I do either or both of the changes below:

ddevienne=> alter table dd alter column val set compression lz4;
ALTER TABLE
ddevienne=> alter table dd alter column val set storage extended;
ALTER TABLE

According to the doc, the table is NOT changed.
In my case, I DO want to have the bytea column rewritten
according to the new STORAGE and/or COMPRESSION.
The doc doesn't mention how to achieve that.

VACUUM?
VACUUM FULL?
Else?

Also, pre-v14, is eXtended STORAGE always compressed with pglz?
The reason I'm asking, is that I don't see much difference between
sum(length(bytea_col)) and pg_total_relation_size(reltoastrelid), thus
I'm not quite sure whether compression happens or not, in fact.

Note that most of the data is floating point arrays inside a bytea columns,
thus I guess unless one shuffles the bytes around, it doesn't compress too well.

Thanks, --DD

Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Michael Paquier
Date:
On Mon, Oct 02, 2023 at 04:42:15PM +0200, Dominique Devienne wrote:
> According to the doc, the table is NOT changed.
> In my case, I DO want to have the bytea column rewritten
> according to the new STORAGE and/or COMPRESSION.
> The doc doesn't mention how to achieve that.

Yes, the compression type of the existed toasted values are not
changed after an ALTER SET, and that's not something one would do on a
daily basis, either.

> VACUUM?
> VACUUM FULL?
> Else?

VACUUM was considered as an option to force a rewrte, but it has been
removed because we were not sure that this is the correct path to do
so or that in some cases forcing the hand of the user was incorrect.
It was also creating a penalty in some of the hot loops of area:
commit: dbab0c07e5ba1f19a991da2d72972a8fe9a41bda
committer: Michael Paquier <michael@paquier.xyz>
date: Mon, 14 Jun 2021 09:25:50 +0900
Remove forced toast recompression in VACUUM FULL/CLUSTER

Related thread:
https://postgr.es/m/20210527003144.xxqppojoiwurc2iz@alap3.anarazel.de
--
Michael

Attachment

Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
rob stone
Date:
Hello,

On Tue, 2023-10-03 at 08:57 +0900, Michael Paquier wrote:
> On Mon, Oct 02, 2023 at 04:42:15PM +0200, Dominique Devienne wrote:
> > According to the doc, the table is NOT changed.
> > In my case, I DO want to have the bytea column rewritten
> > according to the new STORAGE and/or COMPRESSION.
> > The doc doesn't mention how to achieve that.
>
> Yes, the compression type of the existed toasted values are not
> changed after an ALTER SET, and that's not something one would do on
> a
> daily basis, either.
>
> > VACUUM?
> > VACUUM FULL?
> > Else?
>
> VACUUM was considered as an option to force a rewrte, but it has been
> removed because we were not sure that this is the correct path to do
> so or that in some cases forcing the hand of the user was incorrect.
> It was also creating a penalty in some of the hot loops of area:
> commit: dbab0c07e5ba1f19a991da2d72972a8fe9a41bda
> committer: Michael Paquier <michael@paquier.xyz>
> date: Mon, 14 Jun 2021 09:25:50 +0900
> Remove forced toast recompression in VACUUM FULL/CLUSTER
>
> Related thread:
> https://postgr.es/m/20210527003144.xxqppojoiwurc2iz@alap3.anarazel.de
> --
> Michael


Would running CLUSTER on the table use the new parameters for the re-
write?

The docs about CLUSTER do not mention anything about toast'ed data. If
the toast data is huge it would be held in a separate file on disk. If
the toast column is being compressed, and its size is now less than the
value used as to whether a separate file is required or not, will the
outcome be a single row for the re-write and deletion of the "old"
toast'ed entry?

Robert




Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Laurenz Albe
Date:
On Tue, 2023-10-03 at 12:33 +1100, rob stone wrote:
> Would running CLUSTER on the table use the new parameters for the re-
> write?

No, as far as I know.

You'd need something like

  -- rewrite all tuples
  UPDATE tab SET id = id;
  -- get rid of the bloat
  VACUUM (FULL) tab;

Yours,
Laurenz Albe



Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Michael Paquier
Date:
On Tue, Oct 03, 2023 at 06:31:27AM +0200, Laurenz Albe wrote:
> On Tue, 2023-10-03 at 12:33 +1100, rob stone wrote:
>> Would running CLUSTER on the table use the new parameters for the re-
>> write?
>
> No, as far as I know.

Note that under the hoods VACUUM FULL and CLUSTER use the same code
paths when doing their stuff.

> You'd need something like
>
>   -- rewrite all tuples
>   UPDATE tab SET id = id;
>   -- get rid of the bloat
>   VACUUM (FULL) tab;

I'm afraid so, and get ready for a burst of WAL that depends on the
size of your relation if you are too aggressive with the updates.  You
could do that in periodic steps, as well.
--
Michael

Attachment

Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Dominique Devienne
Date:
On Tue, Oct 3, 2023 at 6:45 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 03, 2023 at 06:31:27AM +0200, Laurenz Albe wrote:
> On Tue, 2023-10-03 at 12:33 +1100, rob stone wrote:
>> Would running CLUSTER on the table use the new parameters for the re-
>> write?
>
> No, as far as I know.

Note that under the hoods VACUUM FULL and CLUSTER use the same code
paths when doing their stuff.

> You'd need something like
>   -- rewrite all tuples
>   UPDATE tab SET id = id;
>   -- get rid of the bloat
>   VACUUM (FULL) tab;

OK. I didn't expect this, but I can deal with it. Especially since this is basically what I'm
doing anyway. Remember the thread on this ML about "chunking" large bytea values?
Well, this is about trying out several chunk sizes and/or compression, to find the right
config / tuning for our access patterns. We've already done the "rechunking", and I'm adding
the changes in compression (and thus storage, when disabling compression).
 
I'm afraid so, and get ready for a burst of WAL that depends on the
size of your relation if you are too aggressive with the updates.  You
could do that in periodic steps, as well.

 In my case, it's OK not to be transactional, for these experiments. Is there a way
to lock the table and do the rewriting w/o generating any WAL? I don't have any experience
with unlogged tables, but should I take an exclusive lock on the table, switch it to unlogged,
rewrite, and switch it back to logged?

What about my last question about whether storage=extended always being compressed?
Given that I don't see much compression, at least when looking indirectly via total-rel-sizes?
Is there a way to evaluate the compression ratios achieved on TOASTED values?

Thanks, --DD

Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Michael Paquier
Date:
On Tue, Oct 03, 2023 at 09:08:49AM +0200, Dominique Devienne wrote:
>  In my case, it's OK not to be transactional, for these experiments. Is
> there a way
> to lock the table and do the rewriting w/o generating any WAL? I don't have
> any experience
> with unlogged tables, but should I take an exclusive lock on the table,
> switch it to unlogged,
> rewrite, and switch it back to logged?

Switching a table back to be logged requires all its 8k blocks to be
WAL-logged, so that would be roughly the same as a plain UPDATE.
--
Michael

Attachment

Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Laurenz Albe
Date:
On Tue, 2023-10-03 at 09:08 +0200, Dominique Devienne wrote:
>  In my case, it's OK not to be transactional, for these experiments. Is there a way
> to lock the table and do the rewriting w/o generating any WAL? I don't have any experience
> with unlogged tables, but should I take an exclusive lock on the table, switch it to unlogged,
> rewrite, and switch it back to logged?

The only way to avoid WAL is to use unlogged tables, but they lose their data after
a crash.  If you change an unlogged table to a logged table, the whol table ends up
in WAL, so you won't save anything that way.

The best thing that you can do to reduce the amount of WAL is to TRUNCATE and
populate the table in the same transaction.  Then, if you have "wal_level = minimal",
PostgreSQL can forego writing WAL information.  But you cannot have archive recovery
and replication with "wal_level = minimal".

> What about my last question about whether storage=extended always being compressed?
> Given that I don't see much compression, at least when looking indirectly via total-rel-sizes?
> Is there a way to evaluate the compression ratios achieved on TOASTED values?

If you don't see much compression in your toasted data, it is possible that you are
storing data that are already compressed.  In that case, PostgreSQL will attempt
compression with its algorithms that are geared at speed rather than good compression.
If it finds that the data grew after compression, it will discard the compressed value
and continue with the original value.  To avoid that useless compression attempt,
you should use STORAGE EXTERNAL in such cases.

Yours,
Laurenz Albe



Re: How to force "re-TOAST" after changing STORAGE or COMPRESSION?

From
Dominique Devienne
Date:
On Tue, Oct 3, 2023 at 9:17 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2023-10-03 at 09:08 +0200, Dominique Devienne wrote:
>  In my case, it's OK not to be transactional, for these experiments. Is there a way
> to lock the table and do the rewriting w/o generating any WAL? I don't have any experience
> with unlogged tables, but should I take an exclusive lock on the table, switch it to unlogged,
> rewrite, and switch it back to logged?

The only way to avoid WAL is to use unlogged tables, but they lose their data after
a crash.  If you change an unlogged table to a logged table, the whole table ends up
in WAL, so you won't save anything that way.

Thanks Laurenz and Michael. I didn't know that. Interesting.
And logical I guess, considering replication.
 
The best thing that you can do to reduce the amount of WAL is to TRUNCATE and
populate the table in the same transaction.  Then, if you have "wal_level = minimal",
PostgreSQL can forego writing WAL information.  But you cannot have archive recovery
and replication with "wal_level = minimal"

No idea about the granularity of wal_level (cluster wide? db wide? more granular?),
but that doesn't sound practical on a cluster shared with others, I suspect.

It would also force to load all the bytea values client-side, instead of 1-by-1, which
is possible (a few GBs typically, max), but not the way we've coded it.
 
> What about my last question about whether storage=extended always being compressed?
> Given that I don't see much compression, at least when looking indirectly via total-rel-sizes?
> Is there a way to evaluate the compression ratios achieved on TOASTED values?

If you don't see much compression in your toasted data, it is possible that you are
storing data that are already compressed.  In that case, PostgreSQL will attempt
compression with its algorithms that are geared at speed rather than good compression.
If it finds that the data grew after compression, it will discard the compressed value
and continue with the original value.  To avoid that useless compression attempt,
you should use STORAGE EXTERNAL in such cases.

I know for sure the bytea values are not compressed. Again, they are numerical arrays
(float, double), which don't typically compress well, especially the mantissa part. So it
sounds like there's no way to examine the hidden TOASTed values for compression ratio then.
Bummer. Thanks again Laurenz. --DD