Optimize external TOAST storage - Mailing list pgsql-hackers
From | davinder singh |
---|---|
Subject | Optimize external TOAST storage |
Date | |
Msg-id | CAHzhFSEQ8Vi8pEWeZERPajCbOfvvae2Si5imxftEGhXp6px5iQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Optimize external TOAST storage
|
List | pgsql-hackers |
Hi,
For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least
and then divided into chunks. The problem with storing compressed value is, if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at least
1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
we will save decompression costs without losing much on storage.
In our tests, we have observed improvement in the read performance up to 28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
gain is more on large attributes (size > 4KB) because compression/decompression
cost increases with size.
Create table t1_lz4 ( a text compression lz4, b text compression lz4);
However, We have found, this assumption is not true when the data compression
ratio is more and the size is barely big enough to cause TOASTing. For example,
in the following test 4. b, we didn't get any performance advantage but the table
size grew by 42% by storing uncompressed values.
Test Setup.Create table t1_lz4 ( a text compression lz4, b text compression lz4);
-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;
--Test
Select b from t1_lz4;
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;
--Test
Select b from t1_lz4;
Test 1:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB
Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB
Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
table size: Head 1662 MB, Patch: 1662 MB
Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
table size: Head 1636 MB, Patch: 1688 MB
Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
table size: Head 1601 MB, Patch: 1601 MB
Test 4.a:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
table size: Head 872 MB, Patch: 872 MB
Test 4.b (High compression):
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0%
table size: Head 612 MB, Patch: 872 MB
This is the worst case for this optimization because of the following 2 reasons.
First, the table size would grow by 50% when compressed size is half of the original size and
yet barely large enough for TOASTing. Table size can't grow more than that because If
compression reduces the size even more then it will reduce the #chunks as well and it stores
the compressed value in the table.
Second, not much gain in performance because of the small attribute size, more attributes fit
in page (almost twice), on each page access it can access twice the number of rows. And
also small value means low compression/decompression costs.
We have avoided such cases by applying the optimization when attribute size > 4KB.
Regards,
Davinder
Attachment
pgsql-hackers by date: