[proposal] de-TOAST'ing using a iterator - Mailing list pgsql-hackers

From Binguo Bao
Subject [proposal] de-TOAST'ing using a iterator
Date
Msg-id CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com
Whole thread Raw
Responses Re: [proposal] de-TOAST'ing using a iterator
List pgsql-hackers
Hi hackers!
This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.

de-TOAST iterator overview:
1. The caller requests the slice of the attribute value from the de-TOAST iterator.
2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
    otherwise goto the step3.
3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
    call fetch_datum_iterator to fetch datums from disk to input buffer.
4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
    needs are met.

I've implemented the prototype and apply it to the position() function to test performance.
Test tables:
-----------------------------------------------------------------------------------------------------
create table detoast_c (id serial primary key,
a text
);
insert into detoast_c (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;
insert into detoast_u (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);
**************************************************************************************
-----------------------------------------------------------------------------------------------------
                         query                                    |     master (ms)  |  patch  (ms)  |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_c;    |     4054.838       |    1440.735   |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_c;    |     25549.270     |   27696.245  |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_u;    |     8116.996       |    1386.802   |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_u     |     28442.116     |   27672.319  |
-----------------------------------------------------------------------------------------------------
**************************************************************************************
It can be seen that the iterator greatly improves the efficiency of partial de-TOAST when it has almost no degradation in full de-TOAST efficiency.
Next, I will continue to study how to apply iterators to more queries 
and improve iterator efficiency, such as using macros instead of function calls.

The patch is also available on github[1].
Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan