Thread: Some memory allocations in gin fastupdate code are a bit brain dead

Some memory allocations in gin fastupdate code are a bit brain dead

From

David Rowley

Date:

18 December 2018, 10:50:07

I recently stumbled upon the following code in ginfast.c:

while (collector->ntuples + nentries > collector->lentuples)
{
    collector->lentuples *= 2;
    collector->tuples = (IndexTuple *) repalloc(collector->tuples,
      sizeof(IndexTuple) * collector->lentuples);
}

it does not seem very smart to perform the repalloc() inside the loop
here as there could be many loops before we double the lentuples so
that it's large enough to allow storage of all the required values.

The attached patch changes things around so that the repalloc() is
done outside of the loop. i.e. done only once, after we've determined
the correct size to reallocate it to. I've also added an else
condition so that we only bother checking this case when the tuples[]
array is not already allocated.

I tested with the following:

create table t1 (a int[], b int[]);
create index on t1 using gin (a,b) with (fastupdate = on);
truncate t1; insert into t1 select '{1}'::int[],('{' ||
string_agg(y::text, ',') || '}')::int[] from
generate_Series(1,1000000) x, generate_Series(1,10) y group by x order
by x desc;

In the above case with an unpatched master, I measured the repalloc()
to only consume 0.6% of the total runtime of the INSERT, so this does
not really improve performance by much, but I thought it was worth
fixing never-the-less.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

fix_gin_fast_insert.patch

Re: Some memory allocations in gin fastupdate code are a bit brain dead

From

Tom Lane

Date:

18 December 2018, 15:24:45

David Rowley <david.rowley@2ndquadrant.com> writes:
> I recently stumbled upon the following code in ginfast.c:
> while (collector->ntuples + nentries > collector->lentuples)
> {
>     collector->lentuples *= 2;
>     collector->tuples = (IndexTuple *) repalloc(collector->tuples,
>       sizeof(IndexTuple) * collector->lentuples);
> }

I agree that's pretty stupid, but I wonder whether we should also try
harder in the initial-allocation path.  Right now, if the initial
pass through this code sees say 3 tuples to insert, it will then work
with 3 * 2^n entries in subsequent enlargements, which is likely to
be pretty inefficient.  Should we try to force the initial allocation
to be a power of 2?

Also, do we need to worry about integer overflow?

            regards, tom lane

Re: Some memory allocations in gin fastupdate code are a bit brain dead

From

David Rowley

Date:

19 December 2018, 14:53:18

On Wed, 19 Dec 2018 at 04:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > I recently stumbled upon the following code in ginfast.c:
> > while (collector->ntuples + nentries > collector->lentuples)
> > {
> >     collector->lentuples *= 2;
> >     collector->tuples = (IndexTuple *) repalloc(collector->tuples,
> >       sizeof(IndexTuple) * collector->lentuples);
> > }
>
> I agree that's pretty stupid, but I wonder whether we should also try
> harder in the initial-allocation path.  Right now, if the initial
> pass through this code sees say 3 tuples to insert, it will then work
> with 3 * 2^n entries in subsequent enlargements, which is likely to
> be pretty inefficient.  Should we try to force the initial allocation
> to be a power of 2?

Yeah, I think that's a good idea.

> Also, do we need to worry about integer overflow?

Good point.  I didn't think of it before, but the previous code would
have ensured that we got an ERROR before any overflow could have
occurred as the repalloc() would fail once the allocation request went
beyond MaxAllocSize. However, I think an overflow is sufficiently
unlikely that we don't need to do any batching, but we should keep it
an error and not a crash.

I propose the attached.  If we hit the overflow case, we'll still
attempt the repalloc(), but it'll fail, just as it would have before,
just without the needless calls.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

fix_gin_fast_insert_v2.patch

Re: Some memory allocations in gin fastupdate code are a bit brain dead

From

Tom Lane

Date:

19 December 2018, 16:44:06

David Rowley <david.rowley@2ndquadrant.com> writes:
> On Wed, 19 Dec 2018 at 04:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, do we need to worry about integer overflow?

> Good point.  I didn't think of it before, but the previous code would
> have ensured that we got an ERROR before any overflow could have
> occurred as the repalloc() would fail once the allocation request went
> beyond MaxAllocSize. However, I think an overflow is sufficiently
> unlikely that we don't need to do any batching, but we should keep it
> an error and not a crash.

Agreed.

> I propose the attached.  If we hit the overflow case, we'll still
> attempt the repalloc(), but it'll fail, just as it would have before,
> just without the needless calls.

I don't think this is quite bulletproof against overflow, especially
in view of the rather careless mixing of int32 and uint32 variables
that exists here.  The easiest way to make it bulletproof is to add
an explicit test, so I did that and pushed it.

            regards, tom lane

Re: Some memory allocations in gin fastupdate code are a bit brain dead

From

David Rowley

Date:

19 December 2018, 22:08:30

On Thu, 20 Dec 2018 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think this is quite bulletproof against overflow, especially
> in view of the rather careless mixing of int32 and uint32 variables
> that exists here.  The easiest way to make it bulletproof is to add
> an explicit test, so I did that and pushed it.

Thanks for tidying that up and for pushing.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services