Home > mailing lists

Simplifying width_bucket_numeric() - Mailing list pgsql-hackers

From	Dean Rasheed
Subject	Simplifying width_bucket_numeric()
Date	July 6, 2024 15:36:41
Msg-id	CAEZATCVbJH+LE9EXW8Rk3AxLe=jbOk2yrT_AUJGGh5Rah6zoeg@mail.gmail.com Whole thread
Responses	Re: Simplifying width_bucket_numeric()
List	pgsql-hackers

Tree view

In the numeric width_bucket() code, we currently do the following:

    mul_var(&operand_var, count_var, &operand_var,
            operand_var.dscale + count_var->dscale);
    div_var(&operand_var, &bound2_var, result_var,
            select_div_scale(&operand_var, &bound2_var), true);

    if (cmp_var(result_var, count_var) >= 0)
        set_var_from_var(count_var, result_var);
    else
    {
        add_var(result_var, &const_one, result_var);
        floor_var(result_var, result_var);
    }

select_div_scale() returns a value between 16 and 1000, depending on
the dscales and weights of its inputs, so div_var() computes that many
digits after the decimal point and rounds the final digit. Assuming
the result doesn't exceed count, we then floor the result, throwing
away all those digits to get the integer part that we want.

Instead, this can be done more simply and efficiently, using division
with truncation as follows:

    mul_var(&operand_var, count_var, &operand_var,
            operand_var.dscale + count_var->dscale);
    div_var(&operand_var, &bound2_var, result_var, 0, false);
    add_var(result_var, &const_one, result_var);

This doesn't compute any digits after the decimal point, and instead
just returns the integer part of the quotient, which is much cheaper
to compute, and precisely what we want.

In fact, the current code is slightly inaccurate, because the rounding
step can incorrectly round up into the next internal bucket, for
example:

  width_bucket(6.666666666666666, 0, 10, 3) -> 2
  width_bucket(6.6666666666666666, 0, 10, 3) -> 3

though in practice that's extremely unlikely and doesn't really matter.

Patch attached. I didn't bother with any new test cases, since there
appears to be sufficient coverage already.

As a quick performance/correctness test, I ran the following:

SELECT setseed(0);
CREATE TEMP TABLE t AS
  SELECT random(-4.000000, 8.000000) op,
         random(-4.100000, -2.000000) b1,
         random(6.000000, 8.100000) b2,
         random(1, 15) c
  FROM generate_series(1, 10000000);

SELECT hash_array(array_agg(width_bucket(op, b1, b2, c))) FROM t;
-- Result not changed by patch

SELECT sum(width_bucket(op, b1, b2, c)) FROM t;
Time: 3658.962 ms (00:03.659)  -- HEAD
Time: 3089.946 ms (00:03.090)  -- with patch

Regards,
Dean

Attachment

simplify-width_bucket_numeric.patch

pgsql-hackers by date:

From: Tom Lane
Date: 06 July 2024, 14:25:33
Subject: Re: XML test error on Arch Linux

From: Andrey M. Borodin
Date: 06 July 2024, 17:36:17
Subject: Re: Add LSN <-> time conversion functionality

Simplifying width_bucket_numeric() - Mailing list pgsql-hackers

Attachment

Previous

Next