Thread: Performance Enhancement/Fix for Array Utility Functions

Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:
I noticed while doing work with very large arrays that several
functions such as array_length detoast the entire array instead of
only what is required.

I found the solution to be just unpacking the header portion of the
array and ignoring the rest.  Since the header (including the
dimensions) is variable length, I just unpack the size of what the
header would be if it had MAXDIM dimensions. (Patch is attached)

I made a test case to demonstrate performance gains (watch out, it
creates a big table):

create temporary table foo as
    select array_agg(i) as a
    from (
        select generate_series(1,10000000) as i) as bar;
\timing
select array_length(a, 1) from foo; -- Run a few times.  First time will be cold

Results (after warming up)

Before patch:
Time: 6.251 ms
Time: 6.078 ms
Time: 5.983 ms

After patch:
Time: 0.401 ms
Time: 0.397 ms
Time: 0.441 ms
...

--
Michael Lewis
lolrus.org
mikelikespie@gmail.com

Attachment

Re: Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:
Woops. I sent the wrong patch. My apologies.  Attached is the real
patch.  Sorry, also forgot this is made against 9.0 alpha 4 tag.

Thanks,
Mike

--
Michael Lewis
lolrus.org
mikelikespie@gmail.com



On Wed, Mar 31, 2010 at 12:39 AM, Mike Lewis <mikelikespie@gmail.com> wrote:
> I noticed while doing work with very large arrays that several
> functions such as array_length detoast the entire array instead of
> only what is required.
>
> I found the solution to be just unpacking the header portion of the
> array and ignoring the rest.  Since the header (including the
> dimensions) is variable length, I just unpack the size of what the
> header would be if it had MAXDIM dimensions. (Patch is attached)
>
> I made a test case to demonstrate performance gains (watch out, it
> creates a big table):
>
> create temporary table foo as
>        select array_agg(i) as a
>        from (
>                select generate_series(1,10000000) as i) as bar;
> \timing
> select array_length(a, 1) from foo; -- Run a few times.  First time will be cold
>
> Results (after warming up)
>
> Before patch:
> Time: 6.251 ms
> Time: 6.078 ms
> Time: 5.983 ms
>
> After patch:
> Time: 0.401 ms
> Time: 0.397 ms
> Time: 0.441 ms
> ...
>
> --
> Michael Lewis
> lolrus.org
> mikelikespie@gmail.com
>

Attachment

Re: Performance Enhancement/Fix for Array Utility Functions

From
Robert Haas
Date:
On Wed, Mar 31, 2010 at 5:08 AM, Mike Lewis <mikelikespie@gmail.com> wrote:
> Woops. I sent the wrong patch. My apologies.  Attached is the real
> patch.  Sorry, also forgot this is made against 9.0 alpha 4 tag.

Neat.  Please add it here:

https://commitfest.postgresql.org/action/commitfest_view/open

...Robert


Re: Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:
On Wed, Mar 31, 2010 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Neat.  Please add it here:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> ...Robert
>

Thanks. Added it.

https://commitfest.postgresql.org/action/patch_view?id=292

-Mike


Re: Performance Enhancement/Fix for Array Utility Functions

From
Daniel Farina
Date:
On Wed, Mar 31, 2010 at 9:47 AM, Mike Lewis <mikelikespie@gmail.com> wrote:
> Thanks. Added it.
>
> https://commitfest.postgresql.org/action/patch_view?id=292

I have reviewed this patch; this is my review:

Regression tests pass with assertions enabled.

Performance gains reported by author confirmed.

The existence and naming of ARR_MAX_HEADER_SIZE is somewhat dubious,
as it is:

* Used in exactly one place (not necessarily a reason why it should
not be reified into a stand-alone definition, though, but
something to consider)

* The array header refers to the NULL bitmap as well, but the
interpretation used by the patch does not.

I think this patch is safe, as all the array fields required are
before the null bitmap, but I think the naming of this definition
is very misleading.

Generally I think the delimited untoasting of metadata from arrays
separately from the payload is Not A Bad Idea.

fdr


Re: Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:

The existence and naming of ARR_MAX_HEADER_SIZE is somewhat dubious,
as it is:

Thanks you for the feedback.  I cleaned up the patch.
 
* Used in exactly one place (not necessarily a reason why it should
not be reified into a stand-alone definition, though, but
something to consider)

Moved it to one definition
 
* The array header refers to the NULL bitmap as well, but the
interpretation used by the patch does not.

I renamed the macros to have NONULL in the name (hopefully it doesn't make them too long).
 
I also added a comment.  Not quite sure if it's the appropriate format, but I didn't feel it warranted 3 lines.

Thanks,
Mike Lewis
Attachment

Re: Performance Enhancement/Fix for Array Utility Functions

From
Tom Lane
Date:
Daniel Farina <drfarina@acm.org> writes:
> Generally I think the delimited untoasting of metadata from arrays
> separately from the payload is Not A Bad Idea.

I looked at this patch a bit.  I agree that it could be a big win for
large external arrays, but ...

1. As-is, it's a significant *pessimization* for small arrays, because
the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
is not needed because the data is already not toasted.  I think there
needs to be a code path that avoids that.

2. Arrays that are large enough to be pushed out to toast storage are
almost certainly going to get compressed first.  The potential win in
this case is very limited because heap_tuple_untoast_attr_slice will
fetch and decompress the whole thing.  Admittedly this is a limitation
of the existing code and not a fault of the patch proper, but still, if
you want to make something that's generically useful, you need to do
something about that.  Perhaps pglz_decompress() could be extended with
an argument to say "decompress no more than this much" --- although that
would mean adding another test to its inner loop, so we'd need to check
for performance degradation.  I'm also unsure how to predict how much
compressed data needs to be read in to get at least N bytes of
decompressed data.
        regards, tom lane


Re: Performance Enhancement/Fix for Array Utility Functions

From
Robert Haas
Date:
On Fri, Jul 16, 2010 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Farina <drfarina@acm.org> writes:
>> Generally I think the delimited untoasting of metadata from arrays
>> separately from the payload is Not A Bad Idea.
>
> I looked at this patch a bit.  I agree that it could be a big win for
> large external arrays, but ...
>
> 1. As-is, it's a significant *pessimization* for small arrays, because
> the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
> is not needed because the data is already not toasted.  I think there
> needs to be a code path that avoids that.

This seems like it shouldn't be too hard to fix, and I think it should be fixed.

> 2. Arrays that are large enough to be pushed out to toast storage are
> almost certainly going to get compressed first.  The potential win in
> this case is very limited because heap_tuple_untoast_attr_slice will
> fetch and decompress the whole thing.  Admittedly this is a limitation
> of the existing code and not a fault of the patch proper, but still, if
> you want to make something that's generically useful, you need to do
> something about that.  Perhaps pglz_decompress() could be extended with
> an argument to say "decompress no more than this much" --- although that
> would mean adding another test to its inner loop, so we'd need to check
> for performance degradation.  I'm also unsure how to predict how much
> compressed data needs to be read in to get at least N bytes of
> decompressed data.

But this part seems to me to be something that can, and probably
should, be left for a separate patch.  I don't see that we're losing
anything this way; we're just not winning as much as we otherwise
might.  To really fix this, I suspect we'd need a version of
pglz_decompress that can operate in "streaming mode"; you tell it how
many bytes you want, and it returns a code indicating that either (a)
it decompressed that many bytes or (b) it decompressed less than that
many bytes and you can call it again with another chunk of data if you
want to keep going.  That'd probably be a good thing to have, but I
don't think it needs to be a prerequisite for this patch.

I'm going to mark this patch "Waiting on Author".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:

> 1. As-is, it's a significant *pessimization* for small arrays, because
> the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
> is not needed because the data is already not toasted.  I think there
> needs to be a code path that avoids that.

This seems like it shouldn't be too hard to fix, and I think it should be fixed.

Do you have any suggestions where to start?  I do agree that this should be fixed as well.   I don't have too much time to dedicate to this project.  I can try to put in some time this weekend though if it isn't looking too bad.


> 2. Arrays that are large enough to be pushed out to toast storage are
> almost certainly going to get compressed first.  The potential win in
> this case is very limited because heap_tuple_untoast_attr_slice will
> fetch and decompress the whole thing.  Admittedly this is a limitation
> of the existing code and not a fault of the patch proper, but still, if
> you want to make something that's generically useful, you need to do
> something about that.  Perhaps pglz_decompress() could be extended with
> an argument to say "decompress no more than this much" --- although that
> would mean adding another test to its inner loop, so we'd need to check
> for performance degradation.  I'm also unsure how to predict how much
> compressed data needs to be read in to get at least N bytes of
> decompressed data.

But this part seems to me to be something that can, and probably
should, be left for a separate patch.  I don't see that we're losing
anything this way; we're just not winning as much as we otherwise
might.  To really fix this, I suspect we'd need a version of
pglz_decompress that can operate in "streaming mode"; you tell it how
many bytes you want, and it returns a code indicating that either (a)
it decompressed that many bytes or (b) it decompressed less than that
many bytes and you can call it again with another chunk of data if you
want to keep going.  That'd probably be a good thing to have, but I
don't think it needs to be a prerequisite for this patch.

Hopefully this is the case.  I can try tackling the first part, however.

Thanks,
Mike
--
Michael Lewis
lolrus.org
mikelikespie@gmail.com

Re: Performance Enhancement/Fix for Array Utility Functions

From
Robert Haas
Date:
On Wed, Jul 28, 2010 at 1:20 AM, Mike Lewis <mikelikespie@gmail.com> wrote:
>>
>> > 1. As-is, it's a significant *pessimization* for small arrays, because
>> > the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
>> > is not needed because the data is already not toasted.  I think there
>> > needs to be a code path that avoids that.
>>
>> This seems like it shouldn't be too hard to fix, and I think it should be
>> fixed.
>
> Do you have any suggestions where to start?  I do agree that this should be
> fixed as well.   I don't have too much time to dedicate to this project.  I
> can try to put in some time this weekend though if it isn't looking too bad.

Perhaps you could check VARATT_IS_EXTENDED.  If that's true, then
slice it, but if it's false, then just use the original datum.  You
might want to wrap that up in a function rather than cramming it all
in the macro definition, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Performance Enhancement/Fix for Array Utility Functions

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 1:20 AM, Mike Lewis
> <mikelikespie@gmail.com> wrote:
>>>
>>> > 1. As-is, it's a significant *pessimization* for small arrays,
>>> > because the heap_tuple_untoast_attr_slice code does a
>>> > palloc/copy even when one is not needed because the data is
>>> > already not toasted.  I think there needs to be a code path
>>> > that avoids that.
>>>
>>> This seems like it shouldn't be too hard to fix, and I think it
>>> should be fixed.
>>
>> Do you have any suggestions where to start?  I do agree that this
>> should be fixed as well.   I don't have too much time to dedicate
>> to this project.  I can try to put in some time this weekend
>> though if it isn't looking too bad.
> 
> Perhaps you could check VARATT_IS_EXTENDED.  If that's true, then
> slice it, but if it's false, then just use the original datum. 
> You might want to wrap that up in a function rather than cramming
> it all in the macro definition, though.
As Mike hasn't been able to find the time to get to this yet, I'm
marking this as "Returned with Feedback".  Hopefully the issues can
be addressed in the next five weeks and we can pick it up again in
the next CommitFest.
-Kevin


Re: Performance Enhancement/Fix for Array Utility Functions

From
Mike Lewis
Date:
I started taking a look at the internals of the detoast functions and I came to the conclusion that I didn't have sufficient understanding of what was going on to make the correct changes, nor sufficient time to gain that understanding.  Sorry for not getting back sooner.  There are a lot of different cases for the detoast stuff, and I think I would need a full understanding of toast functionality.  (for example, I didn't even know there was lzma compression in postgres until one of the replies to this thread)


Thanks,
Mike

--
Michael Lewis
lolrus.org
mikelikespie@gmail.com


On Thu, Aug 5, 2010 at 10:52 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 1:20 AM, Mike Lewis
> <mikelikespie@gmail.com> wrote:
>>>
>>> > 1. As-is, it's a significant *pessimization* for small arrays,
>>> > because the heap_tuple_untoast_attr_slice code does a
>>> > palloc/copy even when one is not needed because the data is
>>> > already not toasted.  I think there needs to be a code path
>>> > that avoids that.
>>>
>>> This seems like it shouldn't be too hard to fix, and I think it
>>> should be fixed.
>>
>> Do you have any suggestions where to start?  I do agree that this
>> should be fixed as well.   I don't have too much time to dedicate
>> to this project.  I can try to put in some time this weekend
>> though if it isn't looking too bad.
>
> Perhaps you could check VARATT_IS_EXTENDED.  If that's true, then
> slice it, but if it's false, then just use the original datum.
> You might want to wrap that up in a function rather than cramming
> it all in the macro definition, though.

As Mike hasn't been able to find the time to get to this yet, I'm
marking this as "Returned with Feedback".  Hopefully the issues can
be addressed in the next five weeks and we can pick it up again in
the next CommitFest.

-Kevin