Re: Infinite Interval - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: Infinite Interval |
Date | |
Msg-id | CAEZATCX-zD+EAOwNh66VVKKepNL6Rz3kwFX8eG7c5Y4mDyDqJA@mail.gmail.com Whole thread Raw |
In response to | Re: Infinite Interval (jian he <jian.universality@gmail.com>) |
Responses |
Re: Infinite Interval
Re: Infinite Interval |
List | pgsql-hackers |
On Wed, 20 Sept 2023 at 11:27, jian he <jian.universality@gmail.com> wrote: > > if I remove IntervalAggState's element: MemoryContext, it will not work. > so I don't understand what the above sentence means...... Sorry. (it's > my problem) > I don't see why it won't work. The point is to try to simplify do_interval_accum() as much as possible. Looking at the current code, I see a few places that could be simpler: + X.day = newval->day; + X.month = newval->month; + X.time = newval->time; + + temp.day = state->sumX.day; + temp.month = state->sumX.month; + temp.time = state->sumX.time; Why do we need these local variables X and temp? It could just add the values from newval directly to those in state->sumX. + /* The rest of this needs to work in the aggregate context */ + old_context = MemoryContextSwitchTo(state->agg_context); Why? It's not allocating any memory here, so I don't see a need to switch context. So basically, do_interval_accum() could be simplified to: static void do_interval_accum(IntervalAggState *state, Interval *newval) { /* Count infinite intervals separately from all else */ if (INTERVAL_IS_NOBEGIN (newval)) { state->nInfcount++; return; } if (INTERVAL_IS_NOEND(newval)) { state->pInfcount++; return; } /* Update count of finite intervals */ state->N++; /* Update sum of finite intervals */ if (unlikely(pg_add_s32_overflow(state->sumX.month, newval->month, &state->sumX.month))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); if (unlikely(pg_add_s32_overflow(state->sumX.day, newval->day, &state->sumX.day))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); if (unlikely(pg_add_s64_overflow(state->sumX.time, newval->time, &state->sumX.time))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); return; } and that can be further refactored, as described below, and similarly for do_interval_discard(), except using pg_sub_s32/64_overflow(). > > Also, this needs to include serialization and deserialization > > functions, otherwise these aggregates will no longer be able to use > > parallel workers. That makes a big difference to queryE, if the size > > of the test data is scaled up. > > > I tried, but failed. sum(interval) result is correct, but > avg(interval) result is wrong. > > SELECT sum(b) ,avg(b) > ,avg(b) = sum(b)/count(*) as should_be_true > ,avg(b) * count(*) = sum(b) as should_be_true_too > from interval_aggtest_1m; --1million row. > The above query expects two bool columns to return true, but actually > both returns false.(spend some time found out parallel mode will make > the number of rows to 1_000_002, should be 1_000_0000). > I think the reason for your wrong results is this code in interval_avg_combine(): + if (state2->N > 0) + { + /* The rest of this needs to work in the aggregate context */ + old_context = MemoryContextSwitchTo(agg_context); + + /* Accumulate interval values */ + do_interval_accum(state1, &state2->sumX); + + MemoryContextSwitchTo(old_context); + } The problem is that using do_interval_accum() to add the 2 sums together also adds 1 to the count N, making it incorrect. This code should only be adding state2->sumX to state1->sumX, not touching state1->N. And, as in do_interval_accum(), there is no need to switch memory context. Given that there are multiple places in this file that need to add intervals, I think it makes sense to further refactor, and add a local function to add 2 finite intervals, along the lines of the code above. This can then be called from do_interval_accum(), interval_avg_combine(), and interval_pl(). And similarly for subtracting 2 finite intervals. Regards, Dean
pgsql-hackers by date: