Thread: [proposal] de-TOAST'ing using a iterator

[proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

19 June 2019, 13:51:27

Hi hackers!

This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.

de-TOAST iterator overview:

1. The caller requests the slice of the attribute value from the de-TOAST iterator.

2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
otherwise goto the step3.

3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,

call fetch_datum_iterator to fetch datums from disk to input buffer.

4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's

needs are met.

I've implemented the prototype and apply it to the position() function to test performance.

Test tables:

-----------------------------------------------------------------------------------------------------

create table detoast_c (id serial primary key,
a text
);

insert into detoast_c (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;

insert into detoast_u (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);

**************************************************************************************

-----------------------------------------------------------------------------------------------------

query | master (ms) | patch (ms) |

-----------------------------------------------------------------------------------------------------

select position('123' in a) from detoast_c; | 4054.838 | 1440.735 |

-----------------------------------------------------------------------------------------------------

select position('321' in a) from detoast_c; | 25549.270 | 27696.245 |

-----------------------------------------------------------------------------------------------------

select position('123' in a) from detoast_u; | 8116.996 | 1386.802 |

-----------------------------------------------------------------------------------------------------

select position('321' in a) from detoast_u | 28442.116 | 27672.319 |

-----------------------------------------------------------------------------------------------------

**************************************************************************************

It can be seen that the iterator greatly improves the efficiency of partial de-TOAST when it has almost no degradation in full de-TOAST efficiency.

Next, I will continue to study how to apply iterators to more queries

and improve iterator efficiency, such as using macros instead of function calls.

The patch is also available on github[1].

Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.

[1] https://github.com/djydewang/postgres/pull/1/files

Attachment

0001-de-TOASTing-using-a-iterator.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Thomas Munro

Date:

05 July 2019, 04:21:17

On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <djydewang@gmail.com> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then
updatethe appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing
theentire value. Examples where this can be helpful include using position() from the beginning of the value, or doing
apattern or substring match. 
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result
directly,
>     otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44.
Otherwise,
>     call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until
thecaller's 
>     needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

10 July 2019, 14:18:24

Hi Thomas,

I've fixed the warnings.

Thomas Munro <thomas.munro@gmail.com> 于2019年7月5日周五下午12:21写道：

On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <djydewang@gmail.com> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
> otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
> call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
> needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far. Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

Attachment

0001-de-TOASTing-using-a-iterator-2.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

10 July 2019, 16:39:05

This is the patch that fix warnings.

Best Regards,

Binguo Bao

Binguo Bao <djydewang@gmail.com> 于2019年7月10日周三下午10:18写道：

Hi Thomas,
I've fixed the warnings.

Thomas Munro <thomas.munro@gmail.com> 于2019年7月5日周五下午12:21写道：
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <djydewang@gmail.com> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
> otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
> call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
> needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far. Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

Attachment

0001-de-TOASTing-using-a-iterator-3.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

11 July 2019, 09:23:24

I have set the local build configuration to be the same as on the CI. This patch should be correct.

Best regards,

Binguo Bao

Binguo Bao <djydewang@gmail.com> 于2019年7月11日周四上午12:39写道：

This is the patch that fix warnings.

Best Regards,
Binguo Bao

Binguo Bao <djydewang@gmail.com> 于2019年7月10日周三下午10:18写道：
Hi Thomas,
I've fixed the warnings.

Thomas Munro <thomas.munro@gmail.com> 于2019年7月5日周五下午12:21写道：
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <djydewang@gmail.com> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
> otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
> call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
> needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far. Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

Attachment

0001-de-TOASTing-using-a-iterator-4.patch

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

15 July 2019, 11:20:14

On Wed, Jun 19, 2019 at 8:51 PM Binguo Bao <djydewang@gmail.com> wrote:
> [v4 patch]

Hi Binguo,

I can verify I get no warnings with the v4 patch. I've done some
additional performance testing. First, to sum up your results:

> insert into detoast_c (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from
generate_series(1,100);

When the search pattern was at the beginning, the patch was several
times faster , and when the pattern was at the end, it was 3% slower
when uncompressed and 9% slower when compressed.

First, I'd like to advocate for caution when using synthetic
benchmarks involving compression. Consider this test:

insert into detoast_c (a)
select
    'abc'||
    repeat(
    (SELECT string_agg(md5(chr(i)), '')
    FROM generate_series(1,127) i)
    , 10000)
    ||'xyz'
from generate_series(1,100);

The results for the uncompressed case were not much different then
your test. However, in the compressed case the iterator doesn't buy us
much with beginning searches since full decompression is already fast:

                 master          patch
comp. beg.        869ms          837ms
comp. end       14100ms        16100ms
uncomp. beg.     6360ms          800ms
uncomp. end     21100ms        21400ms

and with compression it's 14% slower searching to the end. This is
pretty contrived, but I include it for demonstration.

To test something hopefully a bit more realistic, I loaded 100 records
each containing the 1995 CIA fact book (~3MB of ascii) with a pattern
string put at the beginning and end. For the end search, I used a
longer needle to speed up the consumption of text, hoping to put more
stress on the detoasting algorithms, for example:

select max(position('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
in a)) from detoast_*;

comp. beg.        836ms           22ms
comp. end        1510ms         1700ms
uncomp. beg.      185ms           12ms
uncomp. end       851ms          903ms

Here, the "beginning" case is ~15-35x faster, which is very impressive
and much faster than with your generated contents. The "end" case is
up to 13% slower. It would be interesting to see where the break-even
point is, where the results are the same.

Reading the thread where you're working on optimizing partial
decompression [1], it seems you have two separate solutions for the
two problems. Maybe this is fine, but I'd like to bring up the
possibility of using the same approach for both kinds of callers.

I'm not an expert on TOAST, but maybe one way to solve both problems
is to work at the level of whole TOAST chunks. In that case, the
current patch would look like this:

1. The caller requests more of the attribute value from the de-TOAST iterator.
2. The iterator gets the next chunk and either copies or decompresses
the whole chunk into the buffer. (If inline, just decompress the whole
thing)

This seems simpler and also easy to adapt to callers that do know how
big a slice they want. I also suspect this way would be easier to
adapt to future TOAST formats not tied to heap or to a certain
compression algorithm. With less bookkeepping overhead, maybe there'll
be less worst-case performance degradation, while not giving up much
in the best case. (Note also that commit 9556aa01c6 already introduced
some performance degradation in near-end searches, when using
multibyte strings. This patch would add to that.) The regression
doesn't seem large, but I see more than your test showed, and it would
be nice to avoid it.

Thoughts, anyone?

[1]
https://www.postgresql.org/message-id/flat/CAL-OGkux7%2BBm_J%3Dt5VpH7fJGGSm%2BPxWJtgs1%2BWU2g6cmLru%3D%3DA%40mail.gmail.com#705d074aa4ae305ed3d992b7e5b7af3c

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

16 July 2019, 14:14:41

Hi, John

First, I'd like to advocate for caution when using synthetic
benchmarks involving compression. Consider this test:
insert into detoast_c (a)
select
'abc'||
repeat(
(SELECT string_agg(md5(chr(i)), '')
FROM generate_series(1,127) i)
, 10000)
||'xyz'
from generate_series(1,100);
The results for the uncompressed case were not much different then
your test. However, in the compressed case the iterator doesn't buy us
much with beginning searches since full decompression is already fast:
master patch
comp. beg. 869ms 837ms
comp. end 14100ms 16100ms
uncomp. beg. 6360ms 800ms
uncomp. end 21100ms 21400ms
and with compression it's 14% slower searching to the end. This is
pretty contrived, but I include it for demonstration.

I've reproduced the test case with test scripts in the attachment on my laptop:

master patch
comp. beg. 2686.77 ms 1532.79 ms
comp. end 17971.8 ms 21206.3 ms
uncomp. beg. 8358.79 ms 1556.93 ms
uncomp. end 23559.7 ms 22547.1 ms

In the compressed beginning case, the test result is different from yours since the patch is ~1.75x faster

rather than no improvement. The interesting thing is that the patch if 4% faster than master in the uncompressed end case.

I can't figure out reason now.

Reading the thread where you're working on optimizing partial
decompression [1], it seems you have two separate solutions for the
two problems. Maybe this is fine, but I'd like to bring up the
possibility of using the same approach for both kinds of callers.

I'm not an expert on TOAST, but maybe one way to solve both problems
is to work at the level of whole TOAST chunks. In that case, the
current patch would look like this:
1. The caller requests more of the attribute value from the de-TOAST iterator.
2. The iterator gets the next chunk and either copies or decompresses
the whole chunk into the buffer. (If inline, just decompress the whole
thing)

Thanks for your suggestion. It is indeed possible to implement PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.

IMO the iterator is more suitable for situations where the caller doesn't know the slice size. If the caller knows the slice size,

it is reasonable to fetch enough chunks at once and then decompress it at once.

Best regards,

Binguo Bao

Attachment

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

18 July 2019, 03:39:48

On Tue, Jul 16, 2019 at 9:14 PM Binguo Bao <djydewang@gmail.com> wrote:
> In the compressed beginning case, the test result is different from yours since the patch is ~1.75x faster
> rather than no improvement. The interesting thing is that the patch if 4% faster than master in the uncompressed end
case.
> I can't figure out reason now.

Probably some differences in our test environments. I wouldn't worry
about it too much, since we can show improvement in more realistic
tests.

>> I'm not an expert on TOAST, but maybe one way to solve both problems
>> is to work at the level of whole TOAST chunks. In that case, the
>> current patch would look like this:
>> 1. The caller requests more of the attribute value from the de-TOAST iterator.
>> 2. The iterator gets the next chunk and either copies or decompresses
>> the whole chunk into the buffer. (If inline, just decompress the whole
>> thing)
>
>
> Thanks for your suggestion. It is indeed possible to implement PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.
> IMO the iterator is more suitable for situations where the caller doesn't know the slice size. If the caller knows
theslice size,
 
> it is reasonable to fetch enough chunks at once and then decompress it at once.

That sounds reasonable for the reason of less overhead.

In the case where we don't know the slice size, how about the other
aspect of my question above: Might it be simpler and less overhead to
decompress entire chunks at a time? If so, I think it would be
enlightening to compare performance.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

25 July 2019, 15:20:50

Hi John!

Sorry for the late reply. It took me some time to fix a random bug.

In the case where we don't know the slice size, how about the other
aspect of my question above: Might it be simpler and less overhead to
decompress entire chunks at a time? If so, I think it would be
enlightening to compare performance.

Good idea. I've tested your propopal with scripts and patch v5 in the attachment:

master patch v4 patch v5

comp. beg. 4364ms 1505ms 1529ms

comp. end 28321ms 31202ms 26916ms
uncomp. beg. 3474ms 1513ms 1523ms
uncomp. end 27416ms 30260ms 25888ms

The proposal improves suffix query performance greatly

with less calls to the decompression function.

Besides, do you have any other suggestions for the structure of DetoastIterator or ToastBuffer?

Maybe they can be designed to be more reasonable.

Thanks again for the proposal.

Best regards,

Binguo Bao

Attachment

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

29 July 2019, 03:48:52

On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> Hi John!
> Sorry for the late reply. It took me some time to fix a random bug.

Don't worry, it's not late at all! :-)

>> In the case where we don't know the slice size, how about the other
>> aspect of my question above: Might it be simpler and less overhead to
>> decompress entire chunks at a time? If so, I think it would be
>> enlightening to compare performance.
>
>
> Good idea. I've tested  your propopal with scripts and patch v5 in the attachment:
>
>                   master          patch v4          patch v5
> comp. beg.        4364ms          1505ms          1529ms
> comp. end       28321ms        31202ms          26916ms
> uncomp. beg.     3474ms          1513ms          1523ms
> uncomp. end     27416ms        30260ms          25888ms
>
> The proposal improves suffix query performance greatly
> with less calls to the decompression function.

Looks good. I repeated my CIA fact book test and found no difference
with compression, but found that suffix search in the uncompressed
case had less regression (~5%) than v4 (>8%). Let's pursue this
further.

> Besides, do you have any other suggestions for the structure of DetoastIterator or ToastBuffer?

My goal for this stage of review was to understand more fully what the
code is doing, and make it as simple and clear as possible, starting
at the top level. In doing so, it looks like I found some additional
performance gains. I haven't looked much yet at the TOAST fetching
logic.

1). For every needle comparison, text_position_next_internal()
calculates how much of the value is needed and passes that to
detoast_iterate(), which then calculates if it has to do something or
not. This is a bit hard to follow. There might also be a performance
penalty -- the following is just a theory, but it sounds plausible:
The CPU can probably correctly predict that detoast_iterate() will
usually return the same value it did last time, but it still has to
call the function and make sure, which I imagine is more expensive
than advancing the needle. Ideally, we want to call the iterator only
if we have to.

In the attached patch (applies on top of your v5),
text_position_next_internal() simply compares hptr to the detoast
buffer limit, and calls detoast_iterate() until it can proceed. I
think this is clearer. (I'm not sure of the error handling, see #2.)
In this scheme, the only reason to know length is to pass to
pglz_decompress_iterate() in the case of in-line compression. As I
alluded to in my first review, I don't think it's worth the complexity
to handle that iteratively since the value is only a few kB. I made it
so in-line datums are fully decompressed as in HEAD and removed struct
members to match. I also noticed that no one updates or looks at
"toast_iter.done" so I removed that as well.

Now pglz_decompress_iterate() doesn't need length at all. For testing
I just set decompress_all = true and let the compiler optimize away
the rest. I left finishing it for you if you agree with these changes.

With this additional patch, the penalty for suffix search in my CIA
fact book test is only ~2% in the compressed case, and might even be
slightly faster than HEAD in the uncompressed case.

2). detoast_iterate() and fetch_datum_iterate() return a value but we
don't check it or do anything with it. Should we do something with it?
It's also not yet clear if we should check the iterator state instead
of return values. I've added some XXX comments as a reminder. We
should also check the return value of pglz_decompress_iterate().

3). Speaking of pglz_decompress_iterate(), I diff'd it with
pglz_decompress(), and I have some questions on it:

a).
+ srcend = (const unsigned char *) (source->limit == source->capacity
? source->limit : (source->limit - 4));

What does the 4 here mean in this expression? Is it possible it's
compensating for this bit in init_toast_buffer()?

+ buf->limit = VARDATA(buf->buf);

It seems the initial limit should also depend on whether the datum is
compressed, right? Can we just do this:

+ buf->limit = buf->position;

b).
- while (sp < srcend && dp < destend)
...
+ while (sp + 1 < srcend && dp < destend &&
...

Why is it here "sp + 1"?

4. Note that varlena.c has a static state variable, and a cleanup
function that currently does:

static void
text_position_cleanup(TextPositionState *state)
{
/* no cleanup needed */
}

It seems to be the detoast iterator could be embedded in this state
variable, and then free-ing can happen here. That has a possible
advantage that the iterator struct would be on the same cache line as
the state data. That would also remove the need to pass "iter" as a
parameter, since these functions already pass "state". I'm not sure if
this would be good for other users of the iterator, so maybe we can
hold off on that for now.

5. Would it be a good idea to add tests (not always practical), or
more Assert()'s? You probably already know this, but as a reminder
it's good to develop with asserts enabled, but never build with them
for performance testing.

I think that's enough for now. If you have any questions or
counter-arguments, let me know. I've set the commitfest entry to
waiting on author.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0002-addendum-to-detoast-v5.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

30 July 2019, 13:20:20

John Naylor <john.naylor@2ndquadrant.com> 于2019年7月29日周一上午11:49写道：

On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao <djydewang@gmail.com> wrote:
My goal for this stage of review was to understand more fully what the

code is doing, and make it as simple and clear as possible, starting
at the top level. In doing so, it looks like I found some additional
performance gains. I haven't looked much yet at the TOAST fetching
logic.

1). For every needle comparison, text_position_next_internal()
calculates how much of the value is needed and passes that to
detoast_iterate(), which then calculates if it has to do something or
not. This is a bit hard to follow. There might also be a performance
penalty -- the following is just a theory, but it sounds plausible:
The CPU can probably correctly predict that detoast_iterate() will
usually return the same value it did last time, but it still has to
call the function and make sure, which I imagine is more expensive
than advancing the needle. Ideally, we want to call the iterator only
if we have to.

In the attached patch (applies on top of your v5),
text_position_next_internal() simply compares hptr to the detoast
buffer limit, and calls detoast_iterate() until it can proceed. I
think this is clearer.

Yes, I think this is a general scenario where the caller continually

calls detoast_iterate until gets enough data, so I think such operations can

be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate

function is called only when the data requested by the caller is greater than the

buffer limit.

(I'm not sure of the error handling, see #2.)
In this scheme, the only reason to know length is to pass to
pglz_decompress_iterate() in the case of in-line compression. As I
alluded to in my first review, I don't think it's worth the complexity
to handle that iteratively since the value is only a few kB. I made it
so in-line datums are fully decompressed as in HEAD and removed struct
members to match.

Sounds good. This not only simplifies the structure and logic of Detoast Iterator

but also has no major impact on efficiency.

I also noticed that no one updates or looks at
"toast_iter.done" so I removed that as well.

toast_iter.done is updated when the buffer limit reached the buffer capacity now.

So, I added it back.

Now pglz_decompress_iterate() doesn't need length at all. For testing
I just set decompress_all = true and let the compiler optimize away
the rest. I left finishing it for you if you agree with these changes.

Done.

2). detoast_iterate() and fetch_datum_iterate() return a value but we
don't check it or do anything with it. Should we do something with it?
It's also not yet clear if we should check the iterator state instead
of return values. I've added some XXX comments as a reminder. We
should also check the return value of pglz_decompress_iterate().

IMO, we need to provide users with a simple iterative interface.

Using the required data pointer to compare with the buffer limit is an easy way.
And the application scenarios of the iterator are mostly read operations.

So I think there is no need to return a value, and the iterator needs to throw an

exception for some wrong calls, such as all the data have been iterated,

but the user still calls the iterator.

3). Speaking of pglz_decompress_iterate(), I diff'd it with
pglz_decompress(), and I have some questions on it:

a).
+ srcend = (const unsigned char *) (source->limit == source->capacity
? source->limit : (source->limit - 4));

What does the 4 here mean in this expression?

Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,

In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and

read unallocated bytes. Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

If we have read all the chunks, we don't need to be careful to cross the border,

just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

Is it possible it's
compensating for this bit in init_toast_buffer()?

+ buf->limit = VARDATA(buf->buf);

It seems the initial limit should also depend on whether the datum is
compressed, right? Can we just do this:

+ buf->limit = buf->position;

I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of

the chunks we read may contain header information. For example, for compressed data chunks,

the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.

This initialization doesn't cause errors, although the position is less than the limit in other cases.

Because we always fetch chunks first, then decompress it.

b).
- while (sp < srcend && dp < destend)
...
+ while (sp + 1 < srcend && dp < destend &&
...

Why is it here "sp + 1"?

Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to

achieve the purpose of parsing ctrl correctly every time.

4. Note that varlena.c has a static state variable, and a cleanup
function that currently does:

static void
text_position_cleanup(TextPositionState *state)
{
/* no cleanup needed */
}

It seems to be the detoast iterator could be embedded in this state
variable, and then free-ing can happen here. That has a possible
advantage that the iterator struct would be on the same cache line as
the state data. That would also remove the need to pass "iter" as a
parameter, since these functions already pass "state". I'm not sure if
this would be good for other users of the iterator, so maybe we can
hold off on that for now.

Good idea. I've implemented it in patch v6.

5. Would it be a good idea to add tests (not always practical), or
more Assert()'s? You probably already know this, but as a reminder
it's good to develop with asserts enabled, but never build with them
for performance testing.

I've added more Assert()'s to check iterator state.

I think that's enough for now. If you have any questions or
counter-arguments, let me know. I've set the commitfest entry to
waiting on author.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

BTW, I found that iterators come in handy for json/jsonb's find field value or get array elements operations.

I will continue to optimize the json/jsonb query based on the detoast iterator patch.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-6.patch

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

02 August 2019, 07:12:20

On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> 于2019年7月29日周一 上午11:49写道：
>>
>> 1). For every needle comparison, text_position_next_internal()
>> calculates how much of the value is needed and passes that to
>> detoast_iterate(), which then calculates if it has to do something or
>> not. This is a bit hard to follow. There might also be a performance
>> penalty -- the following is just a theory, but it sounds plausible:
>> The CPU can probably correctly predict that detoast_iterate() will
>> usually return the same value it did last time, but it still has to
>> call the function and make sure, which I imagine is more expensive
>> than advancing the needle. Ideally, we want to call the iterator only
>> if we have to.
>>
>> In the attached patch (applies on top of your v5),
>> text_position_next_internal() simply compares hptr to the detoast
>> buffer limit, and calls detoast_iterate() until it can proceed. I
>> think this is clearer.
>
>
> Yes, I think this is a general scenario where the caller continually
> calls detoast_iterate until gets enough data, so I think such operations can
> be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
> function is called only when the data requested by the caller is greater than the
> buffer limit.

I like the use of a macro here. However, I think we can find a better
location for the definition. See the header comment of fmgr.h:
"Definitions for the Postgres function manager and function-call
interface." Maybe tuptoaster.h is as good a place as any?

>> I also noticed that no one updates or looks at
>> "toast_iter.done" so I removed that as well.
>
>
> toast_iter.done is updated when the buffer limit reached the buffer capacity now.
> So, I added it back.

Okay.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
>> don't check it or do anything with it. Should we do something with it?
>> It's also not yet clear if we should check the iterator state instead
>> of return values. I've added some XXX comments as a reminder. We
>> should also check the return value of pglz_decompress_iterate().
>
>
> IMO, we need to provide users with a simple iterative interface.
> Using the required data pointer to compare with the buffer limit is an easy way.
> And the application scenarios of the iterator are mostly read operations.
> So I think there is no need to return a value, and the iterator needs to throw an
> exception for some wrong calls, such as all the data have been iterated,
> but the user still calls the iterator.

Okay, and see these functions now return void. The orignal
pglz_decompress() returned a value that was check against corruption.
Is there a similar check we can do for the iterative version?

>> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> pglz_decompress(), and I have some questions on it:
>>
>> a).
>> + srcend = (const unsigned char *) (source->limit == source->capacity
>> ? source->limit : (source->limit - 4));
>>
>> What does the 4 here mean in this expression?
>
>
> Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
> In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated
bytes.

Why is this? That tells me the limit is incorrect. Can the setter not
determine the right value?

> Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

Why 4? That's a magic number. Why not 2, or 27?

> If we have read all the chunks, we don't need to be careful to cross the border,
> just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

That's a good thing to comment on, but it doesn't explain why. This
logic seems like a band-aid and I think a committer would want this to
be handled in a more principled way.

>> Is it possible it's
>> compensating for this bit in init_toast_buffer()?
>>
>> + buf->limit = VARDATA(buf->buf);
>>
>> It seems the initial limit should also depend on whether the datum is
>> compressed, right? Can we just do this:
>>
>> + buf->limit = buf->position;
>
>
> I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
> the chunks we read may contain header information. For example, for compressed data chunks,
> the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
> This initialization doesn't cause errors, although the position is less than the limit in other cases.
> Because we always fetch chunks first, then decompress it.

I see what you mean now. This could use a comment or two to explain
the stated constraints may not actually be satisfied at
initialization.

>> b).
>> - while (sp < srcend && dp < destend)
>> ...
>> + while (sp + 1 < srcend && dp < destend &&
>> ...
>>
>> Why is it here "sp + 1"?
>
>
> Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
> achieve the purpose of parsing ctrl correctly every time.

Please explain further. Was the "sp + 1" correct behavior (and why),
or only for debugging setting ctrl/c correctly? Also, I don't think
the new logic for the ctrl/c variables is an improvement:

1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
which is confusing). Any time you initialize with something not 0 or
1, it's a magic number, and here it's far from where the loop variable
is used. This is harder to read.

2. First time though the loop, iter->ctrlc = 8, which immediately gets
set back to 0.

3. At the end of the loop, iter->ctrl/c are unconditionally set. In
v5, there was a condition which would usually avoid this copying of
values through pointers.

>> 4. Note that varlena.c has a static state variable, and a cleanup
>> function that currently does:
>>
>> static void
>> text_position_cleanup(TextPositionState *state)
>> {
>> /* no cleanup needed */
>> }
>>
>> It seems to be the detoast iterator could be embedded in this state
>> variable, and then free-ing can happen here. That has a possible
>> advantage that the iterator struct would be on the same cache line as
>> the state data. That would also remove the need to pass "iter" as a
>> parameter, since these functions already pass "state". I'm not sure if
>> this would be good for other users of the iterator, so maybe we can
>> hold off on that for now.
>
>
> Good idea. I've implemented it in patch v6.

That's better, and I think we can take it a little bit farther.

1. Notice that TextPositionState is allocated on the stack in
text_position(), which passes both the "state" pointer and the "iter"
pointer to text_position_setup(), and only then sets state->iter =
iter. We can easily set this inside text_position(). That would get
rid of the need for other callers to pass NULL iter to
text_position_setup().

2. DetoastIteratorData is fixed size, so I see no reason to allocate
it on the heap. We could allocate it on the stack in text_pos(), and
pass the pointer to create_detoast_iterator() (in this case maybe a
better name is init_detoast_iterator), which would return a bool to
tell text_pos() whether to pass down the pointer or a NULL. The
allocation of other structs (toast buffer and fetch iterator) probably
can't be changed without more work.

>> 5. Would it be a good idea to add tests (not always practical), or
>> more Assert()'s? You probably already know this, but as a reminder
>> it's good to develop with asserts enabled, but never build with them
>> for performance testing.
>
>
> I've added more Assert()'s to check iterator state.

Okay.

> BTW, I found that iterators come in handy for json/jsonb's find field value or get array elements operations.
> I will continue to optimize the json/jsonb query based on the detoast iterator patch.

That will be an interesting use case.

There are other aspects of the patch I should investigate, but I'll
put that off for another time. Commitfest is over, but note that
review can happen at any time. I'll continue to do so as time permits.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

03 August 2019, 16:11:21

John Naylor <john.naylor@2ndquadrant.com> 于2019年8月2日周五下午3:12写道：

On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> 于2019年7月29日周一上午11:49写道：
>>
>> 1). For every needle comparison, text_position_next_internal()
>> calculates how much of the value is needed and passes that to
>> detoast_iterate(), which then calculates if it has to do something or
>> not. This is a bit hard to follow. There might also be a performance
>> penalty -- the following is just a theory, but it sounds plausible:
>> The CPU can probably correctly predict that detoast_iterate() will
>> usually return the same value it did last time, but it still has to
>> call the function and make sure, which I imagine is more expensive
>> than advancing the needle. Ideally, we want to call the iterator only
>> if we have to.
>>
>> In the attached patch (applies on top of your v5),
>> text_position_next_internal() simply compares hptr to the detoast
>> buffer limit, and calls detoast_iterate() until it can proceed. I
>> think this is clearer.
>
>
> Yes, I think this is a general scenario where the caller continually
> calls detoast_iterate until gets enough data, so I think such operations can
> be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
> function is called only when the data requested by the caller is greater than the
> buffer limit.

I like the use of a macro here. However, I think we can find a better
location for the definition. See the header comment of fmgr.h:
"Definitions for the Postgres function manager and function-call
interface." Maybe tuptoaster.h is as good a place as any?

PG_DETOAST_ITERATE isn't a sample function-call interface,

But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is

similar to PG_DETOAST_ITERATE, make condition check first and then

decide whether to call the function. Besides, PG_DETOAST_DATUM,

PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable

to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
>> don't check it or do anything with it. Should we do something with it?
>> It's also not yet clear if we should check the iterator state instead
>> of return values. I've added some XXX comments as a reminder. We
>> should also check the return value of pglz_decompress_iterate().
>
>
> IMO, we need to provide users with a simple iterative interface.
> Using the required data pointer to compare with the buffer limit is an easy way.
> And the application scenarios of the iterator are mostly read operations.
> So I think there is no need to return a value, and the iterator needs to throw an
> exception for some wrong calls, such as all the data have been iterated,
> but the user still calls the iterator.

Okay, and see these functions now return void. The orignal
pglz_decompress() returned a value that was check against corruption.
Is there a similar check we can do for the iterative version?

As far as I know, we can just do such check after all compressed data is decompressed.

If we are slicing, we can't do the check.

>> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> pglz_decompress(), and I have some questions on it:
>>
>> a).
>> + srcend = (const unsigned char *) (source->limit == source->capacity
>> ? source->limit : (source->limit - 4));
>>
>> What does the 4 here mean in this expression?
>
>
> Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
> In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated bytes.

Why is this? That tells me the limit is incorrect. Can the setter not
determine the right value?

There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`:

`ctrl = *sp++;`

`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`

`len += *sp++`

Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to

go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid

reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional,

we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated

data via `sp`.

> Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

Why 4? That's a magic number. Why not 2, or 27?

As I explained above, `sp` may go beyond the `srcend`in the loop, up to the `srcend + 2`.

In theory, it's ok to set the buffer size to be greater than or equal 2.

> If we have read all the chunks, we don't need to be careful to cross the border,
> just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

That's a good thing to comment on, but it doesn't explain why.

Yes, the current comment is puzzling. I'll improve it.

This
logic seems like a band-aid and I think a committer would want this to
be handled in a more principled way.

I don't want to change pglz_decompress logic too much, the iterator should

pay more attention to saving and restoring the original pglz_decompress state.

>> Is it possible it's
>> compensating for this bit in init_toast_buffer()?
>>
>> + buf->limit = VARDATA(buf->buf);
>>
>> It seems the initial limit should also depend on whether the datum is
>> compressed, right? Can we just do this:
>>
>> + buf->limit = buf->position;
>
>
> I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
> the chunks we read may contain header information. For example, for compressed data chunks,
> the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
> This initialization doesn't cause errors, although the position is less than the limit in other cases.
> Because we always fetch chunks first, then decompress it.

I see what you mean now. This could use a comment or two to explain
the stated constraints may not actually be satisfied at
initialization.

Done.

>> b).
>> - while (sp < srcend && dp < destend)
>> ...
>> + while (sp + 1 < srcend && dp < destend &&
>> ...
>>
>> Why is it here "sp + 1"?
>
>
> Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
> achieve the purpose of parsing ctrl correctly every time.

Please explain further. Was the "sp + 1" correct behavior (and why),
or only for debugging setting ctrl/c correctly?

In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before

entering the loop `while (sp < srcend && dp < destend)`, when entering the loop

and read a control byte(sp equals to `srcend` now), the program can't enter the

loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0,

exit the first loop and then this iteration is over. At the next iteration,

the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte

is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read

then the iterator is over.

Also, I don't think
the new logic for the ctrl/c variables is an improvement:

1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
which is confusing). Any time you initialize with something not 0 or
1, it's a magic number, and here it's far from where the loop variable
is used. This is harder to read.

`iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of

the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,

When `ctrlc` reaches 8, a control byte is read from the source

buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the

source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.

2. First time though the loop, iter->ctrlc = 8, which immediately gets
set back to 0.

As I explained above, `iter->ctrlc = 8` make a control byte be read

from the source buffer to `ctrl` on the first iteration. Besides, `iter->ctrlc = 8`

indicates that the valid value of `ctrlc` at the end of the last iteration was not

recorded, Obviously, there are no other iterations before the first iteration.

3. At the end of the loop, iter->ctrl/c are unconditionally set. In
v5, there was a condition which would usually avoid this copying of
values through pointers.

Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop)

whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly.

I think it is more concise in patch v6.

>> 4. Note that varlena.c has a static state variable, and a cleanup
>> function that currently does:
>>
>> static void
>> text_position_cleanup(TextPositionState *state)
>> {
>> /* no cleanup needed */
>> }
>>
>> It seems to be the detoast iterator could be embedded in this state
>> variable, and then free-ing can happen here. That has a possible
>> advantage that the iterator struct would be on the same cache line as
>> the state data. That would also remove the need to pass "iter" as a
>> parameter, since these functions already pass "state". I'm not sure if
>> this would be good for other users of the iterator, so maybe we can
>> hold off on that for now.
>
>
> Good idea. I've implemented it in patch v6.

That's better, and I think we can take it a little bit farther.

1. Notice that TextPositionState is allocated on the stack in
text_position(), which passes both the "state" pointer and the "iter"
pointer to text_position_setup(), and only then sets state->iter =
iter. We can easily set this inside text_position(). That would get
rid of the need for other callers to pass NULL iter to
text_position_setup().

Done.

2. DetoastIteratorData is fixed size, so I see no reason to allocate
it on the heap. We could allocate it on the stack in text_pos(), and
pass the pointer to create_detoast_iterator() (in this case maybe a
better name is init_detoast_iterator), which would return a bool to
tell text_pos() whether to pass down the pointer or a NULL. The
allocation of other structs (toast buffer and fetch iterator) probably
can't be changed without more work.

Done

If there is anything else that is not explained clearly, please point it out.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-7.patch

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

14 August 2019, 05:00:25

On Sat, Aug 3, 2019 at 11:11 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> 于2019年8月2日周五 下午3:12写道：
>>
>> I like the use of a macro here. However, I think we can find a better
>> location for the definition. See the header comment of fmgr.h:
>> "Definitions for the Postgres function manager and function-call
>> interface." Maybe tuptoaster.h is as good a place as any?
>
> PG_DETOAST_ITERATE isn't a sample function-call interface,
> But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
> similar to PG_DETOAST_ITERATE, make condition check first and then
> decide whether to call the function. Besides, PG_DETOAST_DATUM,
> PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
> PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
> to put all the de-TOAST interface together.

Hmm, it's strange that those macros ended up there, but now I see why
it makes sense to add new ones there also.

>> Okay, and see these functions now return void. The orignal
>> pglz_decompress() returned a value that was check against corruption.
>> Is there a similar check we can do for the iterative version?
>
> As far as I know, we can just do such check after all compressed data is decompressed.
> If we are slicing, we can't do the check.

Okay.

>> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> >> pglz_decompress(), and I have some questions on it:
>> >>
>> >> a).
>> >> + srcend = (const unsigned char *) (source->limit == source->capacity
>> >> ? source->limit : (source->limit - 4));
>> >>
>> >> What does the 4 here mean in this expression?
>> >
>> > Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
>> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read
unallocatedbytes. 
>>
>> Why is this? That tells me the limit is incorrect. Can the setter not
>> determine the right value?
>
> There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`:
> `ctrl = *sp++;`
> `off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
> `len += *sp++`
> Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to
> go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid
> reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional,
> we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated
> data via `sp`.

Thank you for the detailed explanation and the comment.

>> Please explain further. Was the "sp + 1" correct behavior (and why),
>> or only for debugging setting ctrl/c correctly?
>
> In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before
> entering the loop `while (sp < srcend && dp < destend)`, when entering the loop
> and read a control byte(sp equals to `srcend` now), the program can't enter the
> loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0,
> exit the first loop and then this iteration is over. At the next iteration,
> the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte
> is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read
> then the iterator is over.

Okay, that's quite subtle. I agree the v6/7 way is more clear in this regard.

>> Also, I don't think
>> the new logic for the ctrl/c variables is an improvement:
>>
>> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
>> which is confusing). Any time you initialize with something not 0 or
>> 1, it's a magic number, and here it's far from where the loop variable
>> is used. This is harder to read.
>
> `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
> the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
> When `ctrlc` reaches 8,  a control byte is read from the source
> buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
> source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.

My point here is it looks strange out of context, but "0" looked
normal. Maybe a comment in init_detoast_buffer(), something like "8
means read a control byte from the source buffer on the first
iteration, see pg_lzdecompress_iterate()".

Or, possibly, we could have a macro like INVALID_CTRLC. That might
even improve the readability of the original function. This is just an
idea, and maybe others would disagree, so you don't need to change it
for now.

>> 3. At the end of the loop, iter->ctrl/c are unconditionally set. In
>> v5, there was a condition which would usually avoid this copying of
>> values through pointers.
>
> Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop)
> whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly.
> I think it is more concise in patch v6.

And, in the case mentioned above where we enter the while loop with sp
= src_end - 1 , we can read the control byte and still store the
correct value for ctrlc.

>>> [varlena.c api]
>> That's better, and I think we can take it a little bit farther.
>>
>> 1. Notice that TextPositionState is allocated on the stack in
>> text_position(), which passes both the "state" pointer and the "iter"
>> pointer to text_position_setup(), and only then sets state->iter =
>> iter. We can easily set this inside text_position(). That would get
>> rid of the need for other callers to pass NULL iter to
>> text_position_setup().
>
>
> Done.

That looks much cleaner, thanks.

I've repeated my performance test to make sure there's no additional
regression in my suffix tests:

                master       patch v7
comp. end       1560s        1600ms
uncomp. end     896ms         890ms

The regression from master in the compressed case is about 2.5%, which
is no different from the last patch I tested, so that's good.

At this point, there are no functional things that I think we need to
change. It's close to ready-for-committer. For the next version, I'd
like you go through the comments and edit for grammar, spelling, and
clarity as you see fit. I know you're not a native speaker of English,
so I can help you with anything that remains. Also note we use braces
on their own lines
{
    like this
}

We do have a source formatting tool (pgindent), but it helps
readability for committers to have it mostly standard beforehand.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

17 August 2019, 07:32:32

Hi John,

>> Also, I don't think
>> the new logic for the ctrl/c variables is an improvement:
>>
>> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
>> which is confusing). Any time you initialize with something not 0 or
>> 1, it's a magic number, and here it's far from where the loop variable
>> is used. This is harder to read.
>
> `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
> the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
> When `ctrlc` reaches 8, a control byte is read from the source
> buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
> source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.

My point here is it looks strange out of context, but "0" looked
normal. Maybe a comment in init_detoast_buffer(), something like "8
means read a control byte from the source buffer on the first
iteration, see pg_lzdecompress_iterate()".

Or, possibly, we could have a macro like INVALID_CTRLC. That might
even improve the readability of the original function. This is just an
idea, and maybe others would disagree, so you don't need to change it
for now.

All in all, the idea is much better than a magic number 8. So, I've implemented it.

At this point, there are no functional things that I think we need to
change. It's close to ready-for-committer. For the next version, I'd
like you go through the comments and edit for grammar, spelling, and
clarity as you see fit. I know you're not a native speaker of English,
so I can help you with anything that remains.

I've tried my best to improve the comments, but there should be room for further improvement

I hope you can help me perfect it.

Also note we use braces
on their own lines
{
like this
}

Done.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-8.patch

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

19 August 2019, 04:55:37

On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao <djydewang@gmail.com> wrote:
> [v8 patch with cosmetic changes]

Okay, looks good. I'll make a few style suggestions and corrections.
In the course of looking at this again, I have a few other questions
below as well.

It looks like you already do this for the most part, but I'll mention
that we try to keep lines, including comments, less than 80 characters
long. pgindent can try to fix that, but the results don't always look
nice.

About variable names: The iterator pointers are variously called
"iter", "iterator", and "fetch_iter". I found this confusing the first
time I read this code. I think we should use "iter" if we have only
one kind in the function, and "detoast_iter" and "fetch_iter" if we
have both kinds.
--

init_detoast_iterator():

+ * The "iterator" variable is normally just a local variable in the caller.

I don't think this comment is helpful to understand this function or its use.

+ * It only make sense to initialize de-TOAST iterator for external
on-disk value.

s/make/makes/
"a" de-TOAST iterator
s/value/values/

The comments in this function that start with "This is a ..." could be
shortened like this:

/* indirect pointer -- dereference it */

While looking at this again, I noticed we no longer need to test for
the in-line compressed case at all. I also tried some other cosmetic
rearrangements. Let me know what you think about the attached patch.
Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come
first. Then the two normal cases are next to eachother.


free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer():

These functions should return void.

+ * Free the memory space occupied by the de-TOAST iterator include buffers and
+ * fetch datum iterator.

Perhaps "Free memory used by the de-TOAST iterator, including buffers
and fetch datum iterator."

The check

if (iter->buf != iter->fetch_datum_iterator->buf)

is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?

#define TOAST_ITER_COMPRESSED(iter) \
    (iter->buf != iter->fetch_datum_iterator->buf)

Or maybe that's too clever?


detoast_iterate():

+ * As long as there is another data chunk in compression or external storage,

We no longer use the iterator with in-line compressed values.

+ * de-TOAST it into toast buffer in iterator.

Maybe "into the iterator's toast buffer"


fetch_datum_iterate():

My remarks for detoast_iterate() also apply here.


init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?


pglz_decompress_iterate():

+ * Decompresses source into dest until the source is exhausted.

This comment is from the original function, but I think it would be
better to highlight the differences from the original, something like:

"This function is based on pglz_decompress(), with these additional
requirements:

1. We need to save the current control byte and byte position for the
caller's next iteration.

2. In pglz_decompress(), we can assume we have all the source bytes
available. This is not the case when we decompress one chunk at a
time, so we have to make sure that we only read bytes available in the
current chunk."

(I'm not sure about the term 'byte position', maybe there's a better one.)

+ * In the while loop, sp may go beyond the srcend, provides a four-byte
+ * buffer to prevent sp from reading unallocated bytes from source buffer.
+ * When source->limit reaches source->capacity, don't worry about reading
+ * unallocated bytes.

Here's my suggestion:

"In the while loop, sp may be incremented such that it points beyond
srcend. To guard against reading beyond the end of the current chunk,
we set srcend such that we exit the loop when we are within four bytes
of the end of the current chunk. When source->limit reaches
source->capacity, we are decompressing the last chunk, so we can (and
need to) read every byte."

+ for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)

Note you can also replace 8 with INVALID_CTRLC here.

tuptoaster.h:
+ * Constrains that need to be satisfied:

s/constrains/constraints/

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

rearrange-init-iter.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

21 August 2019, 17:10:43

John Naylor <john.naylor@2ndquadrant.com> 于2019年8月19日周一下午12:55写道：

init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?

The constraint is broken in the if branch, so I think put this comment in the branch

is more precise.

The check
if (iter->buf != iter->fetch_datum_iterator->buf)
is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?
#define TOAST_ITER_COMPRESSED(iter) \
(iter->buf != iter->fetch_datum_iterator->buf)

The logic of the macro may be hard to understand, so I think it's ok to just check the compressed state variable.

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.

The main role of this macro is to explain the iterator's "ctrlc" state, IMO it's reasonable to put

the macro and definition of de-TOAST iterator together.

Thanks for your suggestion, I have updated the patch.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-9.patch

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

22 August 2019, 03:02:01

On Thu, Aug 22, 2019 at 12:10 AM Binguo Bao <djydewang@gmail.com> wrote:
> [v9 patch]

Thanks, looks good. I'm setting it to ready for committer.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera

Date:

03 September 2019, 20:12:26

> +static void
> +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> +{
> +    buf->buf = (const char *) palloc0(size);

This API is weird -- you always palloc the ToastBuffer first, then call
init_toast_bufer on it.  Why not palloc the ToastBuffer struct in
init_toast_buffer and return it from there instead?  This is
particularly strange since the ToastBuffer itself is freed by the "free"
routine ... so it's not like we're thinking that caller can take
ownership of the struct by embedding it in a larger struct.

Also, this function needs a comment on top explaining what it does and
what the params are.

Why do we need ToastBuffer->buf_size?  Seems unused.

> +    if (iter == NULL)
> +    {
> +        return;
> +    }

Please, no braces around single-statement blocks.  (Many places).

> +/*
> + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8

What does CTRLC stand for?  Also: this comment should explain why the
value 8 is what it is.

> +                /*
> +                 * Now we copy the bytes specified by the tag from OUTPUT to
> +                 * OUTPUT. It is dangerous and platform dependent to use
> +                 * memcpy() here, because the copied areas could overlap
> +                 * extremely!
> +                 */
> +                len = Min(len, destend - dp);
> +                while (len--)
> +                {
> +                    *dp = dp[-off];
> +                    dp++;
> +                }

So why not use memmove?

> +                /*
> +                 * Otherwise it contains the match length minus 3 and the
> +                 * upper 4 bits of the offset. The next following byte
> +                 * contains the lower 8 bits of the offset. If the length is
> +                 * coded as 18, another extension tag byte tells how much
> +                 * longer the match really was (0-255).
> +                 */
> +                int32        len;
> +                int32        off;
> +
> +                len = (sp[0] & 0x0f) + 3;
> +                off = ((sp[0] & 0xf0) << 4) | sp[1];
> +                sp += 2;
> +                if (len == 18)
> +                    len += *sp++;

Starting this para with "Otherwise" makes no sense, since there's no
previous opposite case.  Please reword.  However, I don't recognize this
code from anywhere, and it seems to have a lot of magical numbers.  Is
this code completely new?


Didn't much like FetchDatumIteratorData SnapshotToast struct member
name.  How about just "snapshot"?

> +#define PG_DETOAST_ITERATE(iter, need)                                    \
> +    do {                                                                \
> +        Assert(need >= iter->buf->buf && need <= iter->buf->capacity);    \
> +        while (!iter->done && need >= iter->buf->limit) {                 \
> +            detoast_iterate(iter);                                        \
> +        }                                                                \
> +    } while (0)

This needs parens around each "iter" and "need" in the macro definition.
Also, please add a comment documenting what the arguments are, since
it's not immediately obvious.

> +void free_detoast_iterator(DetoastIterator iter)
> +{
> +    if (iter == NULL)
> +    {
> +        return;
> +    }

If this function is going to do this, why do callers need to check for
NULL also?  Seems pointless.  I'd rather make callers simpler and keep
only the NULL-check inside the function, since this is not perf-critical
anyway.

> +extern void detoast_iterate(DetoastIterator detoast_iter)
> +{

Please, no "extern" in function definitions, only in prototypes in the
.h files.  Also, we indent the function name at the start of line, with
the return type appearing on its own in the previous line.

> +    if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> +        elog(ERROR, "create_fetch_datum_itearator shouldn't be called for non-ondisk datums");

Typo for "iterator".

> +        iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
> +        VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> +        if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> +        {
> [...]
> +        }
> +        else
> +        {
> +            iter->compressed = false;
> +
> +            /* point the buffer directly at the raw data */
> +            iter->buf = iter->fetch_datum_iterator->buf;
> +        }

This arrangement where there are two ToastBuffers and they sometimes are
the same is cute, but I think we need a better way to know when each
needs to be freed afterwards; the proposed coding is confusing.  And it
certainly it needs more than zero comments about what's going on there.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera from 2ndQuadrant

Date:

06 September 2019, 14:52:53

Also: this patch no longer applies.  Please rebase.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

10 September 2019, 13:33:51

Alvaro Herrera <alvherre@2ndquadrant.com> 于2019年9月4日周三上午4:12写道：

> +static void
> +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> +{
> + buf->buf = (const char *) palloc0(size);

This API is weird -- you always palloc the ToastBuffer first, then call
init_toast_bufer on it. Why not palloc the ToastBuffer struct in
init_toast_buffer and return it from there instead? This is
particularly strange since the ToastBuffer itself is freed by the "free"
routine ... so it's not like we're thinking that caller can take
ownership of the struct by embedding it in a larger struct.

I agree with you. I also change "init_detoast_iterator" to "create_detoast_iterator"

so the caller doesn't need to manage the memory allocation of the iterator

Also, this function needs a comment on top explaining what it does and
what the params are.

Done.

Why do we need ToastBuffer->buf_size? Seems unused.

> + if (iter == NULL)
> + {
> + return;
> + }

Removed.

Please, no braces around single-statement blocks. (Many places).

Done.

> +/*
> + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8

What does CTRLC stand for? Also: this comment should explain why the
value 8 is what it is.

I've improved the comment.

> + /*
> + * Now we copy the bytes specified by the tag from OUTPUT to
> + * OUTPUT. It is dangerous and platform dependent to use
> + * memcpy() here, because the copied areas could overlap
> + * extremely!
> + */
> + len = Min(len, destend - dp);
> + while (len--)
> + {
> + *dp = dp[-off];
> + dp++;
> + }

So why not use memmove?

> + /*
> + * Otherwise it contains the match length minus 3 and the
> + * upper 4 bits of the offset. The next following byte
> + * contains the lower 8 bits of the offset. If the length is
> + * coded as 18, another extension tag byte tells how much
> + * longer the match really was (0-255).
> + */
> + int32 len;
> + int32 off;
> +
> + len = (sp[0] & 0x0f) + 3;
> + off = ((sp[0] & 0xf0) << 4) | sp[1];
> + sp += 2;
> + if (len == 18)
> + len += *sp++;

Starting this para with "Otherwise" makes no sense, since there's no
previous opposite case. Please reword. However, I don't recognize this
code from anywhere, and it seems to have a lot of magical numbers. Is
this code completely new?

This function is based on pglz_decompress() in src/common/pg_lzcompress.c and I've

mentioned that in the function's comment at the beginning.

Didn't much like FetchDatumIteratorData SnapshotToast struct member
name. How about just "snapshot"?

Done.

> +#define PG_DETOAST_ITERATE(iter, need) \
> + do { \
> + Assert(need >= iter->buf->buf && need <= iter->buf->capacity); \
> + while (!iter->done && need >= iter->buf->limit) { \
> + detoast_iterate(iter); \
> + } \
> + } while (0)

This needs parens around each "iter" and "need" in the macro definition.
Also, please add a comment documenting what the arguments are, since
it's not immediately obvious.

Parens makes the macro more reliable. Done.

> +void free_detoast_iterator(DetoastIterator iter)
> +{
> + if (iter == NULL)
> + {
> + return;
> + }

If this function is going to do this, why do callers need to check for
NULL also? Seems pointless. I'd rather make callers simpler and keep
only the NULL-check inside the function, since this is not perf-critical
anyway.

Good catch. Done.

> + iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);

> + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> + {
> [...]
> + }
> + else
> + {
> + iter->compressed = false;
> +
> + /* point the buffer directly at the raw data */
> + iter->buf = iter->fetch_datum_iterator->buf;
> + }

This arrangement where there are two ToastBuffers and they sometimes are
the same is cute, but I think we need a better way to know when each
needs to be freed afterwards;

We only need to check the "compressed" field in the iterator to figure out

which buffer should be freed.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-10.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera

Date:

16 September 2019, 21:22:51

On 2019-Sep-10, Binguo Bao wrote:

> +/*
> + * Support for de-TOASTing toasted value iteratively. "need" is a pointer
> + * between the beginning and end of iterator's ToastBuffer. The marco
> + * de-TOAST all bytes before "need" into iterator's ToastBuffer.
> + */
> +#define PG_DETOAST_ITERATE(iter, need)                                            \
> +    do {                                                                        \
> +        Assert((need) >= (iter)->buf->buf && (need) <= (iter)->buf->capacity);    \
> +        while (!(iter)->done && (need) >= (iter)->buf->limit) {                 \
> +            detoast_iterate(iter);                                                \
> +        }                                                                        \
> +    } while (0)
>  /* WARNING -- unaligned pointer */
>  #define PG_DETOAST_DATUM_PACKED(datum) \
>      pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))

In broad terms this patch looks pretty good to me.  I only have a small
quibble with this API definition in fmgr.h -- namely that it forces us
to export the definition of all the structs (that could otherwise be
private to toast_internals.h) in order to satisfy callers of this macro.
I am wondering if it would be possible to change detoast_iterate and
PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
thinking something like "if this returns NULL, then iteration has
finished"; and relieve the macro from doing the "->buf->buf" and
"->buf->limit" checks.  I think changing that would require a change in
how the rest of the code is structured around this (the textpos internal
function), but seems like it would be better overall.

(AFAICS that would enable us to expose much less about the
iterator-related structs to detoast.h -- you should be able to move the
struct defs to toast_internals.h)

Then again, it might be just wishful thinking, but it seems worth
considering at least.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Tomas Vondra

Date:

17 September 2019, 13:34:20

On Mon, Sep 16, 2019 at 06:22:51PM -0300, Alvaro Herrera wrote:
>On 2019-Sep-10, Binguo Bao wrote:
>
>> +/*
>> + * Support for de-TOASTing toasted value iteratively. "need" is a pointer
>> + * between the beginning and end of iterator's ToastBuffer. The marco
>> + * de-TOAST all bytes before "need" into iterator's ToastBuffer.
>> + */
>> +#define PG_DETOAST_ITERATE(iter, need)                                            \
>> +    do {                                                                        \
>> +        Assert((need) >= (iter)->buf->buf && (need) <= (iter)->buf->capacity);    \
>> +        while (!(iter)->done && (need) >= (iter)->buf->limit) {                 \
>> +            detoast_iterate(iter);                                                \
>> +        }                                                                        \
>> +    } while (0)
>>  /* WARNING -- unaligned pointer */
>>  #define PG_DETOAST_DATUM_PACKED(datum) \
>>      pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))
>
>In broad terms this patch looks pretty good to me.  I only have a small
>quibble with this API definition in fmgr.h -- namely that it forces us
>to export the definition of all the structs (that could otherwise be
>private to toast_internals.h) in order to satisfy callers of this macro.
>I am wondering if it would be possible to change detoast_iterate and
>PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
>thinking something like "if this returns NULL, then iteration has
>finished"; and relieve the macro from doing the "->buf->buf" and
>"->buf->limit" checks.  I think changing that would require a change in
>how the rest of the code is structured around this (the textpos internal
>function), but seems like it would be better overall.
>
>(AFAICS that would enable us to expose much less about the
>iterator-related structs to detoast.h -- you should be able to move the
>struct defs to toast_internals.h)
>
>Then again, it might be just wishful thinking, but it seems worth
>considering at least.
>

I do agree hiding the exact struct definition would be nice. IMHO if the
only reason for exposing it is the PG_DETOAST_ITERATE() macro (or rather
the references to buf fields in it) then we can simply provide functions
to return those fields.

Granted, that may have impact on performance, but I'm not sure it'll be
even measurable. Also, the other detoast macros right before this new
one are also ultimately just a function calls.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Binguo Bao

Date:

23 September 2019, 13:55:24

Alvaro Herrera <alvherre@2ndquadrant.com> 于2019年9月17日周二上午5:51写道：

On 2019-Sep-10, Binguo Bao wrote:

> +/*
> + * Support for de-TOASTing toasted value iteratively. "need" is a pointer
> + * between the beginning and end of iterator's ToastBuffer. The marco
> + * de-TOAST all bytes before "need" into iterator's ToastBuffer.
> + */
> +#define PG_DETOAST_ITERATE(iter, need) \
> + do { \
> + Assert((need) >= (iter)->buf->buf && (need) <= (iter)->buf->capacity); \
> + while (!(iter)->done && (need) >= (iter)->buf->limit) { \
> + detoast_iterate(iter); \
> + } \
> + } while (0)
> /* WARNING -- unaligned pointer */
> #define PG_DETOAST_DATUM_PACKED(datum) \
> pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))

In broad terms this patch looks pretty good to me. I only have a small
quibble with this API definition in fmgr.h -- namely that it forces us
to export the definition of all the structs (that could otherwise be
private to toast_internals.h) in order to satisfy callers of this macro.
I am wondering if it would be possible to change detoast_iterate and
PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
thinking something like "if this returns NULL, then iteration has
finished"; and relieve the macro from doing the "->buf->buf" and
"->buf->limit" checks. I think changing that would require a change in
how the rest of the code is structured around this (the textpos internal
function), but seems like it would be better overall.

(AFAICS that would enable us to expose much less about the
iterator-related structs to detoast.h -- you should be able to move the
struct defs to toast_internals.h)

Then again, it might be just wishful thinking, but it seems worth
considering at least.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I've tied to hide the details of the struct in patch v11 with checking "need" pointer

inside detoast_iterate function. I also compared the performance of the two versions.

patch v10 patch v11
comp. beg. 1413ms 1489ms
comp. end 24327ms 28011ms
uncomp. beg. 1439ms 1432ms
uncomp. end 25019ms 29007ms

We can see that v11 is about 15% slower than v10 on suffix queries since this involves

the complete de-TOASTing and detoast_iterate() function is called frequently in v11.

Personally, I prefer patch v10. Its performance is superior, although it exposes some struct details.

Best regards,

Binguo Bao

Attachment

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera

Date:

23 September 2019, 16:45:29

Paul Ramsey, do you have opinions to share about this patch?  I think
PostGIS might benefit from it.  Thread starts here:

https://postgr.es/m/CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Paul Ramsey

Date:

25 September 2019, 20:40:36

> On Sep 23, 2019, at 9:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Paul Ramsey, do you have opinions to share about this patch?  I think
> PostGIS might benefit from it.  Thread starts here:

I like the idea a great deal, but actually PostGIS is probably neutral on it: we generally want to retrieve things off
thefront of our serializations (the metadata header) rather than search through them for things in the middle. So the
improvementsto Pg12 cover all of our use cases. Haven’t had time to do any performance checks on it yet. 

ATB,

P.

Re: [proposal] de-TOAST'ing using a iterator

From

Michael Paquier

Date:

27 November 2019, 08:20:52

On Mon, Sep 23, 2019 at 09:55:24PM +0800, Binguo Bao wrote:
> Personally, I prefer patch v10. Its performance is superior, although it
> exposes some struct details.

Please be careful.  The patch was waiting for author input, but its
latest status does not match what the CF app was saying.  I have moved
this patch to next CF, with "Needs review" as status.
--
Michael

Attachment

signature.asc

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

12 January 2020, 02:53:24

On Mon, Sep 23, 2019 at 9:55 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> Alvaro Herrera <alvherre@2ndquadrant.com> 于2019年9月17日周二 上午5:51写道：
>> In broad terms this patch looks pretty good to me.  I only have a small
>> quibble with this API definition in fmgr.h -- namely that it forces us
>> to export the definition of all the structs (that could otherwise be
>> private to toast_internals.h) in order to satisfy callers of this macro.
>> I am wondering if it would be possible to change detoast_iterate and
>> PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
>> thinking something like "if this returns NULL, then iteration has
>> finished"; and relieve the macro from doing the "->buf->buf" and
>> "->buf->limit" checks.  I think changing that would require a change in
>> how the rest of the code is structured around this (the textpos internal
>> function), but seems like it would be better overall.
>>
>> (AFAICS that would enable us to expose much less about the
>> iterator-related structs to detoast.h -- you should be able to move the
>> struct defs to toast_internals.h)
>>
>> Then again, it might be just wishful thinking, but it seems worth
>> considering at least.
>
> I've tied to hide the details of the struct in patch v11 with checking "need" pointer
> inside detoast_iterate function.

I took a brief look at v11 to see if there's anything I can do to help
it move forward. I'm not yet sure how it would look code-wise to
implement Alvaro and Tomas's comments upthread, but I'm pretty sure
this part means the iterator-related structs are just as exposed as
before, but in a roundabout way that completely defeats the purpose of
hiding internals:

--- a/src/include/access/detoast.h
+++ b/src/include/access/detoast.h
@@ -11,6 +11,7 @@
  */
 #ifndef DETOAST_H
 #define DETOAST_H
+#include "toast_internals.h"

That said, the idea behind the PG_DETOAST_ITERATE macro was my
suggestion so that text_position_next_internal() didn't have to call
the iterator function every time the needle advances, which caused a
noticeable performance penalty. The toast code has moved around quite
a bit since then, and I'm not sure of the best way forward.

Also, c60e520f6e0 changed the standard pglz decompression algorithm.
It might be worth it to see if those changes are applicable to the
iterator case. At least one of the improved comments could be brought
over.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera

Date:

13 March 2020, 13:21:14

On 2020-Jan-12, John Naylor wrote:
> 
> I took a brief look at v11 to see if there's anything I can do to help
> it move forward. I'm not yet sure how it would look code-wise to
> implement Alvaro and Tomas's comments upthread, but I'm pretty sure
> this part means the iterator-related structs are just as exposed as
> before, but in a roundabout way that completely defeats the purpose of
> hiding internals:

Agreed -- I think this patch still needs more work before being
committable; I agree with John that the changes after v10 made it worse,
not better.  Rather than cross-including header files, it seems better
to expose some struct definitions after all and let the main iterator
interface (detoast_iterate) be a "static inline" function in detoast.h.

So let's move forward with v10 (submitted on Sept 10th).

Looking at that version, I don't think the function protos that were put
in toast_internals.h should be there at all; I think they should be in
detoast.h so that they can be used.  But I don't like the fact that
detoast.h now has to include genam.h; that seems pretty bogus.  I think
this can be fixed by moving the FetchDatumIteratorData struct definition
(but not its typedef) to toast_internals.h.

OTOH we've recently seen the TOAST interface (and header files) heavily
reworked because of table-AM considerations, so probably this needs even
more changes to avoid parts of it becoming heapam-dependant again.

create_toast_buffer() doing two pallocs seems a waste.  It could be a
single one,
+   buf = (ToastBuffer *) palloc0(MAXALIGN(sizeof(ToastBuffer)) + size);
+   buf->buf = buf + MAXALIGN(sizeof(ToastBuffer));
(I'm not sure that the MAXALIGNs are strictly necessary there; I think
we access the buf as four-byte aligned stuff somewhere in the toast
innards, but maybe I'm wrong about that.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

25 March 2020, 10:04:53

On Fri, Mar 13, 2020 at 10:19 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>
> So let's move forward with v10 (submitted on Sept 10th).

In the attached v12, based on v10, I've made some progress to address
some of the remaining issues. There's still some work to be done, in
particular to think about how to hide the struct details better, as
mentioned by you and Tomas back in September, but wanted to put this
much out there to keep things moving.

> Rather than cross-including header files, it seems better
> to expose some struct definitions after all and let the main iterator
> interface (detoast_iterate) be a "static inline" function in detoast.h.

The cross-include is gone, and detoast_iterate is now static inline.

> Looking at that version, I don't think the function protos that were put
> in toast_internals.h should be there at all; I think they should be in
> detoast.h so that they can be used.

Done.

> But I don't like the fact that
> detoast.h now has to include genam.h; that seems pretty bogus.  I think
> this can be fixed by moving the FetchDatumIteratorData struct definition
> (but not its typedef) to toast_internals.h.

I took a stab at this, but I ended up playing whack-a-mole with
compiler warnings. I'll have to step back and try again later.

> OTOH we've recently seen the TOAST interface (and header files) heavily
> reworked because of table-AM considerations, so probably this needs even
> more changes to avoid parts of it becoming heapam-dependant again.

Haven't thought about this.

> create_toast_buffer() doing two pallocs seems a waste.  It could be a
> single one,
> +   buf = (ToastBuffer *) palloc0(MAXALIGN(sizeof(ToastBuffer)) + size);
> +   buf->buf = buf + MAXALIGN(sizeof(ToastBuffer));
> (I'm not sure that the MAXALIGNs are strictly necessary there; I think
> we access the buf as four-byte aligned stuff somewhere in the toast
> innards, but maybe I'm wrong about that.)

I tried this briefly and got backend crashes, and didn't try to analyze further.

In addition, I brought in the memcpy() and comment changes in
c60e520f6e from common/pg_lzcompress.c to pglz_decompress_iterate(). I
also made a typo correction in the former, which could be extracted
into a separate patch if this one is not ready in time.

For this comment back in [1]:

> This arrangement where there are two ToastBuffers and they sometimes are
> the same is cute, but I think we need a better way to know when each
> needs to be freed afterwards; the proposed coding is confusing. And it
> certainly it needs more than zero comments about what's going on there.

One idea is to test if the pointers are equal via a macro, rather than
setting and testing a member bool var. And I agree commentary could be
improved in this area.

[1] https://www.postgresql.org/message-id/20190903201226.GA16197%40alvherre.pgsql

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v12-detoasting-using-an-iterator.patch

Re: [proposal] de-TOAST'ing using a iterator

From

Anastasia Lubennikova

Date:

02 November 2020, 17:23:47

Status update for a commitfest entry.

This entry was inactive for a very long time. 
John, are you going to continue working on this?

The last message mentions some open issues, namely backend crashes, so I move it to "Waiting on author".

The new status of this patch is: Waiting on Author

Re: [proposal] de-TOAST'ing using a iterator

From

Alvaro Herrera

Date:

02 November 2020, 17:30:34

On 2020-Nov-02, Anastasia Lubennikova wrote:

> Status update for a commitfest entry.
> 
> This entry was inactive for a very long time. 
> John, are you going to continue working on this?
> 
> The last message mentions some open issues, namely backend crashes, so I move it to "Waiting on author".

As I understand, the patch he posted is fine -- it only crashes when he
tried a change I suggested.  But (as is apparently common) I might be
suggesting the wrong thing.  Since the cfbot says the patch still
applies and works, I suggest to keep it as needs-review.

Re: [proposal] de-TOAST'ing using a iterator

From

John Naylor

Date:

02 November 2020, 19:08:57

On Mon, Nov 2, 2020 at 1:30 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Nov-02, Anastasia Lubennikova wrote:

> Status update for a commitfest entry.
>
> This entry was inactive for a very long time.
> John, are you going to continue working on this?

Not in the near future. For background, this was a 2019 GSoC project where I was reviewer of record, and the patch is mostly good, but there is some architectural awkwardness. I have tried to address that, but have not had success.

> The last message mentions some open issues, namely backend crashes, so I move it to "Waiting on author".

As I understand, the patch he posted is fine -- it only crashes when he
tried a change I suggested.

That's my recollection as well.

John Naylor

EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company

Re: [proposal] de-TOAST'ing using a iterator

From

Anastasia Lubennikova

Date:

27 November 2020, 08:31:15

On 02.11.2020 22:08, John Naylor wrote:

On Mon, Nov 2, 2020 at 1:30 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Nov-02, Anastasia Lubennikova wrote:

> Status update for a commitfest entry.
>
> This entry was inactive for a very long time.
> John, are you going to continue working on this?

Not in the near future. For background, this was a 2019 GSoC project where I was reviewer of record, and the patch is mostly good, but there is some architectural awkwardness. I have tried to address that, but have not had success.

The commitfest is nearing the end and as this thread has stalled, I've marked it Returned with Feedback. Feel free to open a new entry if you return to this patch.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company