Re: Optimize external TOAST storage - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Optimize external TOAST storage
Date
Msg-id CAFiTN-vDda8RnakbHK5z9LNWQ3buKzOhRMXsZEcNqLuQCGdQAw@mail.gmail.com
Whole thread Raw
In response to Optimize external TOAST storage  (davinder singh <davindersingh2692@gmail.com>)
Responses Re: Optimize external TOAST storage  (davinder singh <davindersingh2692@gmail.com>)
List pgsql-hackers
On Mon, Feb 28, 2022 at 3:22 PM davinder singh
<davindersingh2692@gmail.com> wrote:
>
> Hi,
>
> For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
> and then divided into chunks. The problem with storing compressed value is, if we
> are not saving enough space such that it reduces the #chunks then we end up
> adding extra decompression cost on every read.
> Based on the discussion with Robert and Dilip, we tried to optimize this
> process. The idea is to TOAST the compressed value only if we are saving at least
> 1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
> we will save decompression costs without losing much on storage.
>
> In our tests, we have observed improvement in the read performance up to 28% by
> giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
> gain is more on large attributes (size > 4KB) because compression/decompression
> cost increases with size.
> However, We have found, this assumption is not true when the data compression
> ratio is more and the size is barely big enough to cause TOASTing. For example,
> in the following test 4. b, we didn't get any performance advantage but the table
> size grew by 42% by storing uncompressed values.
> Test Setup.
>
> Create table t1_lz4 ( a text compression lz4, b text compression lz4);
> -- Generate random data
> create or replace function generate_att_data(len_info int)
> returns text
> language plpgsql
> as
> $$
> declare
>    value text;
> begin
>    select array_agg(md5(g::text))
>    into value
>    from generate_series(1, round(len_info/33)::int) g;
>    return value;
> end;
> $$;
>
> --Test
> Select b from t1_lz4;
>
> Test 1:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 7925 bytes, after compression: 7845 bytes
> Time for select: head: 42  sec, patch: 37 sec, Performance Gain: 11%
> table size: Head 1662 MB, Patch: 1662 MB
>
> Test 2:
> Data: rows 100000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
> Summary:
> Attribute size: original: 16505 bytes, after compression: 16050 bytes
> Time for select: head: 35.6  sec, patch: 30 sec, Performance Gain: 14%
> table size: Head 1636 MB, Patch: 1688 MB
>
> Test 3:
> Data: rows 50000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
> Summary:
> Attribute size: original: 31685 bytes, after compression: 30263 bytes
> Time for select: head: 35.4  sec, patch: 25.5 sec, Performance Gain: 28%
> table size: Head 1601 MB, Patch: 1601 MB
>
> Test 4.a:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
> Summary:
> Attribute size: original: 3885 bytes, after compression: 3645 bytes
> Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
> table size: Head 872 MB, Patch: 872 MB
>
> Test 4.b (High compression):
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 3966 bytes, after compression: 2012 bytes
> Time for select: head: 27  sec, patch: 26 sec, Performance Gain: 0%
> table size: Head 612 MB, Patch: 872 MB

I think the idea looks interesting and the results are also promising.
I have a few initial comments, later I will do more detailed review.

1.
+    if (*value == orig_toast_value)
+        toast_tuple_externalize(ttc, attribute, options);

Isn't it looks cleaner to check whther the attribute is compressed or
not like this
VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ?

2.
+            /* Using the uncompressed data instead, deleting
compressed data. */
+            pfree(DatumGetPointer(*value));

change as below
/deleting compressed data/release memory for the compressed data

3.
+    memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+    memcpy(toast_values_copy, toast_values, sizeof(toast_values));
+

Add some comment here, what we are trying to copy and why?

4.
+            /* incompressible, ignore on subsequent compression passes. */
+            orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;

Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes?  If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.




-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Imseih (AWS), Sami"
Date:
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Next
From: Tom Lane
Date:
Subject: Re: Add pg_freespacemap extension sql test