Re: Missing free_var() at end of accum_sum_final()? - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: Missing free_var() at end of accum_sum_final()? |
Date | |
Msg-id | CAEZATCUT6k5oazZVXDFGxGpKjYcUk5G0SYzEiRtLxPtpoFMiPg@mail.gmail.com Whole thread Raw |
In response to | Re: Missing free_var() at end of accum_sum_final()? ("Joel Jacobson" <joel@compiler.org>) |
Responses |
Re: Missing free_var() at end of accum_sum_final()?
|
List | pgsql-hackers |
On Mon, 20 Feb 2023 at 08:03, Joel Jacobson <joel@compiler.org> wrote: > > On Mon, Feb 20, 2023, at 08:38, Michael Paquier wrote: > > Perhaps you should register this patch to the commit of March? Here > > it is: > > https://commitfest.postgresql.org/42/ > > Thanks, done. > I have been testing this a bit, and I get less impressive results than the ones reported so far. Testing Andres' example: SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); with HEAD, I get: Time: 216.978 ms Time: 215.376 ms Time: 214.973 ms Time: 216.288 ms Time: 216.494 ms and removing the free_var() from numeric_add_opt_error() I get: Time: 212.706 ms Time: 212.684 ms Time: 211.378 ms Time: 213.383 ms Time: 213.050 ms That's 1-2% faster, not the 6-7% that Andres saw. Testing the same example with the latest 0003-fixed-buf.patch, I get: Time: 224.115 ms Time: 225.382 ms Time: 225.691 ms Time: 224.135 ms Time: 225.412 ms which is now 4% slower. I think the problem is that if you increase the size of NumericVar, you increase the stack space required, as well as adding some overhead to alloc_var(). Also, primitive operations like add_var() directly call digitbuf_alloc(), so as it stands, they don't benefit from the static buffer. Also, I'm not convinced that a 4-digit static buffer would really be of much benefit to many numeric computations anyway. To try to test the real-world benefit to numeric_in(), I re-ran one of the tests I used while testing the non-decimal integer patches, using COPY to read a large number of random numerics from a file: CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric, c5 numeric, c6 numeric, c7 numeric, c8 numeric, c9 numeric, c10 numeric); INSERT INTO foo SELECT trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)) FROM generate_series(1,10000000); COPY foo TO '/tmp/random-numerics.csv'; \timing on TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; With HEAD, this gave: Time: 10750.298 ms (00:10.750) Time: 10746.248 ms (00:10.746) Time: 10772.277 ms (00:10.772) Time: 10758.282 ms (00:10.758) Time: 10760.425 ms (00:10.760) and with 0003-fixed-buf.patch, it gave: Time: 10623.254 ms (00:10.623) Time: 10463.814 ms (00:10.464) Time: 10461.700 ms (00:10.462) Time: 10429.436 ms (00:10.429) Time: 10438.359 ms (00:10.438) So that's a 2-3% gain, which might be worthwhile, if not for the slowdown in the other case. I actually had a slightly different idea to improve numeric.c's memory management, which gives a noticeable improvement for a few of the simple numeric functions: A common pattern in these numeric functions is to allocate memory for the NumericVar's digit buffer while doing the computation, allocate more memory for the Numeric result, copy the digits across, and then free the NumericVar's digit buffer. That can be reduced to just 1 palloc() and no pfree()'s by ensuring that the initial allocation is large enough to hold the final Numeric result, and then re-using that memory instead of allocating more. That can be applied to all the numeric functions, saving a palloc() and a pfree() in each case, and it fits quite well with the way make_result() is used in all but one case (generate_series() needs to be a little more careful to avoid trampling on the digit buffer of the current value). In Andres' generate_series() example, this gave: Time: 203.838 ms Time: 206.623 ms Time: 204.672 ms Time: 202.434 ms Time: 204.893 ms which is around 5-6% faster. In the COPY test, it gave: Time: 10511.293 ms (00:10.511) Time: 10504.831 ms (00:10.505) Time: 10521.736 ms (00:10.522) Time: 10513.039 ms (00:10.513) Time: 10511.979 ms (00:10.512) which is around 2% faster than HEAD, and around 0.3% slower than 0003-fixed-buf.patch None of this had any noticeable impact on the time to run the regression tests, and I tried a few other simple examples, but it was difficult to get consistent results, above the normal variation of the test timings. TBH, I'm yet to be convinced that any of this is actually worthwhile. We might shave a few percent off some simple numeric operations, but I doubt it will make much difference to more complex computations. I'd need to see some more realistic test results, or some real evidence of palloc/pfree causing significant overhead in a numeric computation. Regards, Dean
Attachment
pgsql-hackers by date: