Re: Missing free_var() at end of accum_sum_final()? - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Missing free_var() at end of accum_sum_final()?
Date
Msg-id CAEZATCUT6k5oazZVXDFGxGpKjYcUk5G0SYzEiRtLxPtpoFMiPg@mail.gmail.com
Whole thread Raw
In response to Re: Missing free_var() at end of accum_sum_final()?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Missing free_var() at end of accum_sum_final()?
List pgsql-hackers
On Mon, 20 Feb 2023 at 08:03, Joel Jacobson <joel@compiler.org> wrote:
>
> On Mon, Feb 20, 2023, at 08:38, Michael Paquier wrote:
> > Perhaps you should register this patch to the commit of March?  Here
> > it is:
> > https://commitfest.postgresql.org/42/
>
> Thanks, done.
>

I have been testing this a bit, and I get less impressive results than
the ones reported so far.

Testing Andres' example:

SELECT max(a + b + '17'::numeric + c)
  FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a),
       (SELECT generate_series(1::numeric, 100::numeric)) bb(b),
       (SELECT generate_series(1::numeric, 10::numeric)) cc(c);

with HEAD, I get:

Time: 216.978 ms
Time: 215.376 ms
Time: 214.973 ms
Time: 216.288 ms
Time: 216.494 ms

and removing the free_var() from numeric_add_opt_error() I get:

Time: 212.706 ms
Time: 212.684 ms
Time: 211.378 ms
Time: 213.383 ms
Time: 213.050 ms

That's 1-2% faster, not the 6-7% that Andres saw.

Testing the same example with the latest 0003-fixed-buf.patch, I get:

Time: 224.115 ms
Time: 225.382 ms
Time: 225.691 ms
Time: 224.135 ms
Time: 225.412 ms

which is now 4% slower.

I think the problem is that if you increase the size of NumericVar,
you increase the stack space required, as well as adding some overhead
to alloc_var(). Also, primitive operations like add_var() directly
call digitbuf_alloc(), so as it stands, they don't benefit from the
static buffer. Also, I'm not convinced that a 4-digit static buffer
would really be of much benefit to many numeric computations anyway.

To try to test the real-world benefit to numeric_in(), I re-ran one of
the tests I used while testing the non-decimal integer patches, using
COPY to read a large number of random numerics from a file:

CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric,
                      c5 numeric, c6 numeric, c7 numeric, c8 numeric,
                      c9 numeric, c10 numeric);

INSERT INTO foo
  SELECT trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4)),
         trim_scale(round(random()::numeric*1e4, 4))
  FROM generate_series(1,10000000);
COPY foo TO '/tmp/random-numerics.csv';

\timing on
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';
TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv';

With HEAD, this gave:

Time: 10750.298 ms (00:10.750)
Time: 10746.248 ms (00:10.746)
Time: 10772.277 ms (00:10.772)
Time: 10758.282 ms (00:10.758)
Time: 10760.425 ms (00:10.760)

and with 0003-fixed-buf.patch, it gave:

Time: 10623.254 ms (00:10.623)
Time: 10463.814 ms (00:10.464)
Time: 10461.700 ms (00:10.462)
Time: 10429.436 ms (00:10.429)
Time: 10438.359 ms (00:10.438)

So that's a 2-3% gain, which might be worthwhile, if not for the
slowdown in the other case.

I actually had a slightly different idea to improve numeric.c's memory
management, which gives a noticeable improvement for a few of the
simple numeric functions:

A common pattern in these numeric functions is to allocate memory for
the NumericVar's digit buffer while doing the computation, allocate
more memory for the Numeric result, copy the digits across, and then
free the NumericVar's digit buffer.

That can be reduced to just 1 palloc() and no pfree()'s by ensuring
that the initial allocation is large enough to hold the final Numeric
result, and then re-using that memory instead of allocating more. That
can be applied to all the numeric functions, saving a palloc() and a
pfree() in each case, and it fits quite well with the way
make_result() is used in all but one case (generate_series() needs to
be a little more careful to avoid trampling on the digit buffer of the
current value).

In Andres' generate_series() example, this gave:

Time: 203.838 ms
Time: 206.623 ms
Time: 204.672 ms
Time: 202.434 ms
Time: 204.893 ms

which is around 5-6% faster.

In the COPY test, it gave:

Time: 10511.293 ms (00:10.511)
Time: 10504.831 ms (00:10.505)
Time: 10521.736 ms (00:10.522)
Time: 10513.039 ms (00:10.513)
Time: 10511.979 ms (00:10.512)

which is around 2% faster than HEAD, and around 0.3% slower than
0003-fixed-buf.patch

None of this had any noticeable impact on the time to run the
regression tests, and I tried a few other simple examples, but it was
difficult to get consistent results, above the normal variation of the
test timings.

TBH, I'm yet to be convinced that any of this is actually worthwhile.
We might shave a few percent off some simple numeric operations, but I
doubt it will make much difference to more complex computations. I'd
need to see some more realistic test results, or some real evidence of
palloc/pfree causing significant overhead in a numeric computation.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Killing off removed rels properly
Next
From: Ashutosh Bapat
Date:
Subject: Re: File* argument order, argument types