Thread: Optimize external TOAST storage

Optimize external TOAST storage

From

davinder singh

Date:

28 February 2022, 09:52:11

Hi,

For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least

1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,

we will save decompression costs without losing much on storage.

In our tests, we have observed improvement in the read performance up to 28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The

gain is more on large attributes (size > 4KB) because compression/decompression

cost increases with size.

However, We have found, this assumption is not true when the data compression

ratio is more and the size is barely big enough to cause TOASTing. For example,

in the following test 4. b, we didn't get any performance advantage but the table

size grew by 42% by storing uncompressed values.

Test Setup.

Create table t1_lz4 ( a text compression lz4, b text compression lz4);

-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;

--Test
Select b from t1_lz4;

Test 1:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB

Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB

Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB

Test 4.a:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB

Test 4.b (High compression):
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0%
table size: Head 612 MB, Patch: 872 MB

This is the worst case for this optimization because of the following 2 reasons.

First, the table size would grow by 50% when compressed size is half of the original size and

yet barely large enough for TOASTing. Table size can't grow more than that because If

compression reduces the size even more then it will reduce the #chunks as well and it stores

the compressed value in the table.

Second, not much gain in performance because of the small attribute size, more attributes fit

in page (almost twice), on each page access it can access twice the number of rows. And

also small value means low compression/decompression costs.

We have avoided such cases by applying the optimization when attribute size > 4KB.

Regards,

Davinder

Thread: Optimize external TOAST storage

Attachment

Attachment

Attachment

Attachment