Home > mailing lists

Re: [proposal] de-TOAST'ing using a iterator - Mailing list pgsql-hackers

From	Binguo Bao
Subject	Re: [proposal] de-TOAST'ing using a iterator
Date	July 30, 2019 16:20:20
Msg-id	CAL-OGkvoeAWwtisAh0BTO5JAKV1iLq2H4kTVwOKWHagH3Z=i5w@mail.gmail.com Whole thread Raw
In response to	Re: [proposal] de-TOAST'ing using a iterator (John Naylor <john.naylor@2ndquadrant.com>)
Responses	Re: [proposal] de-TOAST'ing using a iterator (John Naylor <john.naylor@2ndquadrant.com>)
List	pgsql-hackers

Tree view

John Naylor <john.naylor@2ndquadrant.com> 于2019年7月29日周一上午11:49写道：

On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao <djydewang@gmail.com> wrote:
My goal for this stage of review was to understand more fully what the

code is doing, and make it as simple and clear as possible, starting
at the top level. In doing so, it looks like I found some additional
performance gains. I haven't looked much yet at the TOAST fetching
logic.

1). For every needle comparison, text_position_next_internal()
calculates how much of the value is needed and passes that to
detoast_iterate(), which then calculates if it has to do something or
not. This is a bit hard to follow. There might also be a performance
penalty -- the following is just a theory, but it sounds plausible:
The CPU can probably correctly predict that detoast_iterate() will
usually return the same value it did last time, but it still has to
call the function and make sure, which I imagine is more expensive
than advancing the needle. Ideally, we want to call the iterator only
if we have to.

In the attached patch (applies on top of your v5),
text_position_next_internal() simply compares hptr to the detoast
buffer limit, and calls detoast_iterate() until it can proceed. I
think this is clearer.

Yes, I think this is a general scenario where the caller continually

calls detoast_iterate until gets enough data, so I think such operations can

be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate

function is called only when the data requested by the caller is greater than the

buffer limit.

(I'm not sure of the error handling, see #2.)
In this scheme, the only reason to know length is to pass to
pglz_decompress_iterate() in the case of in-line compression. As I
alluded to in my first review, I don't think it's worth the complexity
to handle that iteratively since the value is only a few kB. I made it
so in-line datums are fully decompressed as in HEAD and removed struct
members to match.

Sounds good. This not only simplifies the structure and logic of Detoast Iterator

but also has no major impact on efficiency.

I also noticed that no one updates or looks at
"toast_iter.done" so I removed that as well.

toast_iter.done is updated when the buffer limit reached the buffer capacity now.

So, I added it back.

Now pglz_decompress_iterate() doesn't need length at all. For testing
I just set decompress_all = true and let the compiler optimize away
the rest. I left finishing it for you if you agree with these changes.

Done.

2). detoast_iterate() and fetch_datum_iterate() return a value but we
don't check it or do anything with it. Should we do something with it?
It's also not yet clear if we should check the iterator state instead
of return values. I've added some XXX comments as a reminder. We
should also check the return value of pglz_decompress_iterate().

IMO, we need to provide users with a simple iterative interface.

Using the required data pointer to compare with the buffer limit is an easy way.
And the application scenarios of the iterator are mostly read operations.

So I think there is no need to return a value, and the iterator needs to throw an

exception for some wrong calls, such as all the data have been iterated,

but the user still calls the iterator.

3). Speaking of pglz_decompress_iterate(), I diff'd it with
pglz_decompress(), and I have some questions on it:

a).
+ srcend = (const unsigned char *) (source->limit == source->capacity
? source->limit : (source->limit - 4));

What does the 4 here mean in this expression?

Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,

In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and

read unallocated bytes. Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

If we have read all the chunks, we don't need to be careful to cross the border,

just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

Is it possible it's
compensating for this bit in init_toast_buffer()?

+ buf->limit = VARDATA(buf->buf);

It seems the initial limit should also depend on whether the datum is
compressed, right? Can we just do this:

+ buf->limit = buf->position;

I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of

the chunks we read may contain header information. For example, for compressed data chunks,

the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.

This initialization doesn't cause errors, although the position is less than the limit in other cases.

Because we always fetch chunks first, then decompress it.

b).
- while (sp < srcend && dp < destend)
...
+ while (sp + 1 < srcend && dp < destend &&
...

Why is it here "sp + 1"?

Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to

achieve the purpose of parsing ctrl correctly every time.

4. Note that varlena.c has a static state variable, and a cleanup
function that currently does:

static void
text_position_cleanup(TextPositionState *state)
{
/* no cleanup needed */
}

It seems to be the detoast iterator could be embedded in this state
variable, and then free-ing can happen here. That has a possible
advantage that the iterator struct would be on the same cache line as
the state data. That would also remove the need to pass "iter" as a
parameter, since these functions already pass "state". I'm not sure if
this would be good for other users of the iterator, so maybe we can
hold off on that for now.

Good idea. I've implemented it in patch v6.

5. Would it be a good idea to add tests (not always practical), or
more Assert()'s? You probably already know this, but as a reminder
it's good to develop with asserts enabled, but never build with them
for performance testing.

I've added more Assert()'s to check iterator state.

I think that's enough for now. If you have any questions or
counter-arguments, let me know. I've set the commitfest entry to
waiting on author.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

BTW, I found that iterators come in handy for json/jsonb's find field value or get array elements operations.

I will continue to optimize the json/jsonb query based on the detoast iterator patch.

Best regards,

Binguo Bao

Attachment

0001-de-TOASTing-using-a-iterator-6.patch

pgsql-hackers by date:

From: Tomas Vondra
Date: 30 July 2019, 16:12:46
Subject: Re: Built-in connection pooler

From: Ibrar Ahmed
Date: 30 July 2019, 16:27:07
Subject: Re: block-level incremental backup

Re: [proposal] de-TOAST'ing using a iterator - Mailing list pgsql-hackers

Attachment

Previous

Next