Re: Infinite Interval - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Infinite Interval
Date
Msg-id CAEZATCX-zD+EAOwNh66VVKKepNL6Rz3kwFX8eG7c5Y4mDyDqJA@mail.gmail.com
Whole thread Raw
In response to Re: Infinite Interval  (jian he <jian.universality@gmail.com>)
Responses Re: Infinite Interval
Re: Infinite Interval
List pgsql-hackers
On Wed, 20 Sept 2023 at 11:27, jian he <jian.universality@gmail.com> wrote:
>
> if I remove IntervalAggState's element: MemoryContext, it will not work.
> so I don't understand what the above sentence means...... Sorry. (it's
> my problem)
>

I don't see why it won't work. The point is to try to simplify
do_interval_accum() as much as possible. Looking at the current code,
I see a few places that could be simpler:

+    X.day = newval->day;
+    X.month = newval->month;
+    X.time = newval->time;
+
+    temp.day = state->sumX.day;
+    temp.month = state->sumX.month;
+    temp.time = state->sumX.time;

Why do we need these local variables X and temp? It could just add the
values from newval directly to those in state->sumX.

+    /* The rest of this needs to work in the aggregate context */
+    old_context = MemoryContextSwitchTo(state->agg_context);

Why? It's not allocating any memory here, so I don't see a need to
switch context.

So basically, do_interval_accum() could be simplified to:

static void
do_interval_accum(IntervalAggState *state, Interval *newval)
{
    /* Count infinite intervals separately from all else */
    if (INTERVAL_IS_NOBEGIN (newval))
    {
        state->nInfcount++;
        return;
    }
    if (INTERVAL_IS_NOEND(newval))
    {
        state->pInfcount++;
        return;
    }

    /* Update count of finite intervals */
    state->N++;

    /* Update sum of finite intervals */
    if (unlikely(pg_add_s32_overflow(state->sumX.month, newval->month,
                                     &state->sumX.month)))
        ereport(ERROR,
                errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
                errmsg("interval out of range"));

    if (unlikely(pg_add_s32_overflow(state->sumX.day, newval->day,
                                     &state->sumX.day)))
        ereport(ERROR,
                errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
                errmsg("interval out of range"));

    if (unlikely(pg_add_s64_overflow(state->sumX.time, newval->time,
                                     &state->sumX.time)))
        ereport(ERROR,
                errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
                errmsg("interval out of range"));

    return;
}

and that can be further refactored, as described below, and similarly
for do_interval_discard(), except using pg_sub_s32/64_overflow().


> > Also, this needs to include serialization and deserialization
> > functions, otherwise these aggregates will no longer be able to use
> > parallel workers. That makes a big difference to queryE, if the size
> > of the test data is scaled up.
> >
> I tried, but failed. sum(interval) result is correct, but
> avg(interval) result is wrong.
>
> SELECT sum(b) ,avg(b)
>         ,avg(b) = sum(b)/count(*) as should_be_true
>         ,avg(b) * count(*) = sum(b) as should_be_true_too
> from interval_aggtest_1m; --1million row.
> The above query expects two bool columns to return true, but actually
> both returns false.(spend some time found out parallel mode will make
> the number of rows to 1_000_002, should be 1_000_0000).
>

I think the reason for your wrong results is this code in
interval_avg_combine():

+    if (state2->N > 0)
+    {
+        /* The rest of this needs to work in the aggregate context */
+        old_context = MemoryContextSwitchTo(agg_context);
+
+        /* Accumulate interval values */
+        do_interval_accum(state1, &state2->sumX);
+
+        MemoryContextSwitchTo(old_context);
+    }

The problem is that using do_interval_accum() to add the 2 sums
together also adds 1 to the count N, making it incorrect. This code
should only be adding state2->sumX to state1->sumX, not touching
state1->N. And, as in do_interval_accum(), there is no need to switch
memory context.

Given that there are multiple places in this file that need to add
intervals, I think it makes sense to further refactor, and add a local
function to add 2 finite intervals, along the lines of the code above.
This can then be called from do_interval_accum(),
interval_avg_combine(), and interval_pl(). And similarly for
subtracting 2 finite intervals.

Regards,
Dean



pgsql-hackers by date:

Previous
From: "a.rybakina"
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Maxim Orlov
Date:
Subject: Re: should frontend tools use syncfs() ?