Re: Missing free_var() at end of accum_sum_final()? - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Missing free_var() at end of accum_sum_final()? |
Date | |
Msg-id | 2432830b-f63b-4cb4-a744-2e12503621b1@app.fastmail.com Whole thread Raw |
In response to | Re: Missing free_var() at end of accum_sum_final()? (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Missing free_var() at end of accum_sum_final()?
|
List | pgsql-hackers |
Hi, My apologies, it seems my email didn't reach the list, probably due to the benchmark plot images being to large. Here is the email again, but with URLs to the images instead, and benchmark updated with results for the 0005-fixed-buf.patch. -- On Mon, Feb 20, 2023, at 12:32, Dean Rasheed wrote: > TBH, I'm yet to be convinced that any of this is actually worthwhile. > We might shave a few percent off some simple numeric operations, but I > doubt it will make much difference to more complex computations. I'd > need to see some more realistic test results, or some real evidence of > palloc/pfree causing significant overhead in a numeric computation. Thanks for testing! Good point, I agree with your conclusion; the small change to alloc_var() suggested in 0003-fixed-buf.patch probably didn't provide enough speedups to motivate the change. In the new attached patch, Andres fixed buffer idea has been implemented throughout the entire numeric.c code base. CHANGES: -------- * Instead of a bool, we now have a buf_len struct field, to keep track of the allocated capacity, which can be different from the numbers of digits currently used. buf_len == 0 indicates no memory is allocated, and fixed_buf is possibly used instead. * alloc_var() now reuses the allocated buffer if there is one big enough. * free_var() and zero_var() only call digitbuf_free() if they need to. * set_var_from_var() now also uses the fixed buffer and also reuses the allocated buffer if there is one and it's big enough. * To allow the NumericVar's on the stack to be used without having to allocate digits buf, we have to change callers e.g. add_var(), div_var(), etc, to ensure the result variable isn't the same as one of the operand NumericVars. This wasn't necessary before, since we always allocated a new digits buf for the result, which could then be assigned to the digits field when done with computations. However, this prevented us from relying solely on the existing NumericVar stack variables, so we needed a few new temporary NumericVar stack variables, to hold intermediate results, and set_var_from_var() to copy the operand into the temp var. Assert()'s have been added to such functions, add_abs(), sub_abs(), div_var(), div_var_fast(), div_var_int64() and div_var_int64(), that enforce result being a different object than the two operands. Here is one example from ceil_var() on this from the new 0004-fixed-buf.patch: HEAD: if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0) add_var(&tmp, &const_one, &tmp); set_var_from_var(&tmp, result); The add_var() is a problem since &tmp is both the first operand and the result. Funnily enough, the fix in this particular case, and in floor_var(), is simpler and should be faster: 0005-fixed-buf.patch: if (var->sign == NUMERIC_POS && cmp_var(var, &tmp) != 0) add_var(&tmp, &const_one, result); else set_var_from_var(&tmp, result); This is avoids the set_var_from_var() if the if-branch is taken, as the result is written directly to result. Another example from sqrt_var(): HEAD: add_var(&q_var, &a1_var, &q_var); 0005-fixed-buf.patch: set_var_from_var(&q_var, &tmp_var); add_var(&tmp_var, &a1_var, &q_var); The extra set_var_from_var() seems to be a net-win in many cases, except for the generate_series() example which doesn't have any NumericVar's on the stack, except for the first iteration. BENCHMARK: ---------- SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); HEAD: Time: 158.698 ms Time: 142.486 ms Time: 141.443 ms Time: 142.044 ms Time: 141.651 ms 0005-fixed-buf.patch: Time: 156.371 ms Time: 149.857 ms Time: 150.674 ms Time: 150.409 ms Time: 150.461 ms SELECT setseed(0.1234); CREATE TABLE t (n1 numeric, n2 numeric, n3 numeric, n4 numeric); INSERT INTO t (n1, n2, n3, n4) SELECT round(random()::numeric,2), round(random()::numeric*10,2), round(random()::numeric*100,2), round(random()::numeric*1000,2) FROM generate_series(1,1e7); CHECKPOINT; SELECT SUM(n1+n2+n3+n4) FROM t; HEAD: Time: 758.489 ms Time: 646.794 ms Time: 643.237 ms Time: 642.620 ms Time: 646.218 ms 0005-fixed-buf.patch: Time: 748.093 ms Time: 628.799 ms Time: 629.853 ms Time: 629.166 ms Time: 627.768 ms CREATE TABLE ledger (amount numeric); INSERT INTO ledger (amount) SELECT generate_series(-100000.00,100000,0.01); CHECKPOINT; SELECT SUM(amount*1.25 + 0.5) FROM ledger; HEAD: Time: 1113.080 ms (00:01.113) Time: 931.998 ms Time: 931.009 ms Time: 932.476 ms Time: 933.509 ms 0005-fixed-buf.patch: Time: 1067.298 ms (00:01.067) Time: 883.972 ms Time: 880.165 ms Time: 882.465 ms Time: 893.646 ms CREATE TEMP TABLE foo(c1 numeric, c2 numeric, c3 numeric, c4 numeric, c5 numeric, c6 numeric, c7 numeric, c8 numeric, c9 numeric, c10 numeric); INSERT INTO foo SELECT trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)), trim_scale(round(random()::numeric*1e4, 4)) FROM generate_series(1,10000000); COPY foo TO '/tmp/random-numerics.csv'; TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; HEAD: Time: 8515.644 ms (00:08.516) Time: 8405.150 ms (00:08.405) Time: 8399.067 ms (00:08.399) Time: 8678.949 ms (00:08.679) Time: 8388.152 ms (00:08.388) 0005-fixed-buf.patch: Time: 8255.290 ms (00:08.255) Time: 7986.409 ms (00:07.986) Time: 8005.748 ms (00:08.006) Time: 8004.352 ms (00:08.004) Time: 8160.537 ms (00:08.161) FIXED_BUF_LEN: -------------- In 0005-fixed-buf.patch, the new FIXED_BUF_LEN def has been set to 8. I've benchmarked other values as well, 2, 4, 16, 32, but 8 seems to be the sweet spot. div_var() would benefit from FIXED_BUF_LEN 16 though. Here is the test results using different FIXED_BUF_LEN values. Pardon the images, but it's difficult to textify the two dimensional results, as we need to vary the digit length of both operands. This first image shows the execution time for numeric_add() with operands up to 20 decdigits: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-add.pdf.png The color scale is execution_time, where more reddish/hotter means slower, and more blueish/cooler means faster. As we can see, it gets notably cooler already with FIXED_BUF_LEN 4, up to 8 decdigits, but it also gets a bit hotter for larger numbers. With FIXED_BUF_LEN 8, it's cooler both for small numbers, but it's also cooler for larger numbers. If instead looking at numeric_div(), we can see how FIXED_BUF_LEN 16 would be an improvement: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-div.pdf.png The plot for numeric_mul() is a bit more difficult to read, but we can see some improvement already at FIXED_BUF_LEN 4: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-mul.pdf.png Note the scales are different for all these three plots. In the plot below, the scale is the same for all three operators, which can be nice to understand their relative execution time: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-20-digits-overview.pdf.png In these plots we have only studied operands with up to 20 decdigits. For completion, here is plots up to 200 decdigits: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/up-to-200-digits-overview.pdf.png As we can see, there is no significant observable difference, as expected, since the fixed buffer only improves moderately big numbers. And finally, here is a plot of up to 131072 decdigits: https://gist.githubusercontent.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4/raw/d99d8fdc6f34dbed255f310dfc998a429117301a/full-range-overview.pdf.png Some additional plots can be viewed at the end of this gist: https://gist.github.com/joelonsql/59bd2642d577fe4aaf2fb8b1ab7f67c4 It would be nice to avoid the additional tmp vars, as it glutters the interface. One idea maybe could be to use an additional struct field for the writing of the result. Or at least add helper-functions to avoid an extra line of code for the affected places in the code. /Joel
Attachment
pgsql-hackers by date: