Re: Interesting misbehavior of repalloc() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Interesting misbehavior of repalloc()
Date
Msg-id 28319.1186941244@sss.pgh.pa.us
Whole thread Raw
In response to Re: Interesting misbehavior of repalloc()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> We could also only do the realloc-in-place only if there isn't a 4k chunk in
>> the 4k freelist. I'm imagining that usually there wouldn't be.

> Or in general, if there's a free chunk of the right size then copy to
> it, else consider realloc-in-place.  Counterintuitive but it might work.
> I'm not sure how often there wouldn't be a free chunk though ...

I experimented with this a bit.  Not doing enlarge-in-place when there's
a suitable free chunk turns out to be practically a one-line addition to
AllocSetRealloc, but the question is whether that forty-line block of
code is pulling its weight at all.  I added some debug code to log when
the different cases happen, and ran the regression tests.  (Which maybe
aren't very representative of real-world usage, but it's the best easy
test I can think of.)  What I got was
380 successful enlarge in place438 blocked by new rule about available chunk6078 other reallocs of small chunks

The "other reallocs" are ones where one of the existing limitations
prevent us from using realloc-in-place.

The successful enlargements broke down like this:
 12 realloc enlarge 16 -> 24  1 realloc enlarge 16 -> 32  1 realloc enlarge 16 -> 40  1 realloc enlarge 16 -> 64  1
reallocenlarge 16 -> 80139 realloc enlarge 256 -> 512119 realloc enlarge 512 -> 1024 80 realloc enlarge 1024 -> 2048 26
reallocenlarge 2048 -> 4096
 

Bearing in mind that the first number is the number of bytes of data
we'd have to copy if we don't enlarge-in-place, we're not saving that
much work.  (Cases involving larger chunks are passed off to libc's
realloc(), so there's never anything bigger than 2K of copying at
stake, at least when power-of-2 request sizes are used.)

I drilled down a bit deeper and found that most of the larger realloc's
are coming from just two places: enlargement of StringInfo buffers
(initially 256 bytes) and enlargement of scan.l's literalbuf (initially
128 bytes).  I changed the initial allocations to 1K for each of these,
and then the profile of successful realloc-in-place changes to
 12 realloc enlarge 16 -> 24  1 realloc enlarge 16 -> 32  1 realloc enlarge 16 -> 40  1 realloc enlarge 16 -> 64  1
reallocenlarge 16 -> 80 81 realloc enlarge 1024 -> 2048 25 realloc enlarge 2048 -> 4096
 

Here, all of the remaining larger realloc's are happening during CREATE
VIEW operations (while constructing the pg_rewrite rule text), which
probably need not be considered a performance-critical path.

Based on this, I conclude that the realloc-in-place code doesn't pull
its weight.  We should just remove it, and increase those penurious
initial allocations in stringinfo.c and scan.l to avoid most of the
use-cases for repalloc in the first place.

Does anyone have any other test cases to suggest?  Stuff like pgbench
isn't interesting --- it doesn't cause repalloc to be invoked at all.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Problem with locks
Next
From: Tom Lane
Date:
Subject: Re: regexp_matches and regexp_split are inconsistent