Thread: Numeric multiplication overflow errors

Numeric multiplication overflow errors

From
Dean Rasheed
Date:
I found a couple of places where numeric multiplication suffers from
overflow errors for inputs that aren't necessarily very large in
magnitude.

The first is with the numeric * operator, which attempts to always
produce the exact result, even though the numeric type has a maximum
of 16383 digits after the decimal point. If the limit is exceeded an
overflow error is produced, e.g.:

SELECT (1+2e-10000) * (1+3e-10000);
ERROR:  value overflows numeric format

I can't imagine anyone actually wanting that many digits after the
decimal point, but it can happen with a sequence of multiplications,
where the number of digits after the decimal point grows with each
multiply. Throwing an error is not particularly useful, and that error
message is quite misleading, since the result is not very large.
Therefore I propose to make this round the result to 16383 digits if
it exceeds that, as in the first attached patch.

It's worth noting that to get correct rounding, it's necessary to
compute the full exact product (which we're actually already doing)
before rounding, as opposed to passing rscale=16383 to mul_var(), and
letting it round. The latter approach would compute a truncated
product with MUL_GUARD_DIGITS extra digits of precision, which doesn't
necessarily round the final digit the right way. The test case in the
patch is an example that would round the wrong way with a truncated
product.

I considered doing the final rounding in mul_var() (after computing
the full product), to prevent such overflows for all callers, but I
think that's the wrong approach because it risks covering up other
problems, such as the following:

While looking through the remaining numeric code, I found one other
place that has a similar problem -- when calculating the sum of
squares for aggregates like variance() and stddev(), the squares can
end up with more than 16383 digits after the decimal point. When the
query is running on a single backend, that's no problem, because the
final result is rounded to 1000 digits. However, if it uses parallel
workers, the result from each worker is sent using numeric_send/recv()
which attempts to convert to numeric before sending. Thus it's
possible to have an aggregate query that fails if it uses parallel
workers and succeeds otherwise.

In this case, I don't think that rounding the result from each worker
is the right approach, since that can lead to the final result being
different depending on whether or not the query uses parallel workers.
Also, given that each worker is already doing the hard work of
computing these squares, it seems a waste to just discard that
information.

So the second patch fixes this by adding new numericvar_send/recv()
functions capable of sending the full precision NumericVar's from each
worker, without rounding. The first test case in this patch is an
example that would round the wrong way if the result from each worker
were rounded before being sent.

An additional benefit to this approach is that it also addresses the
issue noted in the old code about its use of numeric_send/recv() being
wasteful:

    /*
     * This is a little wasteful since make_result converts the NumericVar
     * into a Numeric and numeric_send converts it back again. Is it worth
     * splitting the tasks in numeric_send into separate functions to stop
     * this? Doing so would also remove the fmgr call overhead.
     */

So the patch converts all aggregate serialization/deserialization code
to use the new numericvar_send/recv() functions. I doubt that that
gives much in the way of a performance improvement, but it makes the
code a little neater as well as preventing overflows.

After writing that, I realised that there's another overflow risk --
if the input values are very large in magnitude, the sum of squares
could genuinely overflow the numeric type, while the final variance
could be quite small (and that can't be fixed by rounding). So this
too fails when parallel workers are used, and succeeds otherwise, and
the patch fixes this too, so I added a separate test case for it.

Regards,
Dean

Attachment

Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Mon, 28 Jun 2021 at 21:16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> So the second patch fixes this by adding new numericvar_send/recv()
> functions capable of sending the full precision NumericVar's from each
> worker, without rounding. The first test case in this patch is an
> example that would round the wrong way if the result from each worker
> were rounded before being sent.

Instead of adding a send/recv function, unless I'm mistaken, it should
be possible to go the whole hog and optimizing this further by instead
of having numericvar_send(), add:

static void numericvar_serialize(StringInfo buf, const NumericVar *var)
{
 /* guts of numericvar_send() here */
}

and then rename numericvar_recv to numericvar_deserialize.

That should allow the complexity to be reduced a bit further as it'll
allow you to just serialize the NumericVar into the existing buffer
rather than having the send function create a new one only to have the
caller copy it back out again into another buffer.  It also allows you
to get rid of the sumX and sumX2 vars from the serialize functions.

> An additional benefit to this approach is that it also addresses the
> issue noted in the old code about its use of numeric_send/recv() being
> wasteful:
>
>     /*
>      * This is a little wasteful since make_result converts the NumericVar
>      * into a Numeric and numeric_send converts it back again. Is it worth
>      * splitting the tasks in numeric_send into separate functions to stop
>      * this? Doing so would also remove the fmgr call overhead.
>      */
>
> So the patch converts all aggregate serialization/deserialization code
> to use the new numericvar_send/recv() functions. I doubt that that
> gives much in the way of a performance improvement, but it makes the
> code a little neater as well as preventing overflows.

I did mean to come back to that comment one day, so thanks for doing
this and for finding/fixing the overflow bugs.

It's unfortunate that we're up against Amdahl's law here.  The best
cases to parallelise have fewer groups, so we
serialise/combine/deserialise fewer groups in the best cases meaning
the optimisation done here are not being executed as much as the
not-so-good case.  So yeah, I agree that it might be difficult to
measure.

David



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
Thanks for looking!

On Mon, 28 Jun 2021 at 12:27, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Instead of adding a send/recv function, unless I'm mistaken, it should
> be possible to go the whole hog and optimizing this further by instead
> of having numericvar_send(), add:
>
> static void numericvar_serialize(StringInfo buf, const NumericVar *var)
> {
>  /* guts of numericvar_send() here */
> }
>
> and then rename numericvar_recv to numericvar_deserialize.
>
> That should allow the complexity to be reduced a bit further as it'll
> allow you to just serialize the NumericVar into the existing buffer
> rather than having the send function create a new one only to have the
> caller copy it back out again into another buffer.  It also allows you
> to get rid of the sumX and sumX2 vars from the serialize functions.

Yes, agreed. That simplifies the code nicely as well as saving a buffer copy.

I'm not a fan of the *serialize() function names in numeric.c though
(e.g., the name numeric_serialize() seems quite misleading for what it
actually does). So rather than adding to those, I've kept the original
names. In the context where they're used, those names seem more
natural.

Regards,
Dean

Attachment

Re: Numeric multiplication overflow errors

From
David Rowley
Date:
Thanks for the updated patch

On Tue, 29 Jun 2021 at 22:29, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I'm not a fan of the *serialize() function names in numeric.c though
> (e.g., the name numeric_serialize() seems quite misleading for what it
> actually does). So rather than adding to those, I've kept the original
> names. In the context where they're used, those names seem more
> natural.

I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it
all looks mostly ok.

I kinda disagree with the send/recv naming since all you're using them
for is to serialise/deserialise the NumericVar. Functions named
*send() and recv() I more expect to return a bytea so they can be used
for a type's send/recv function. I just don't have the same
expectations for functions named serialize/deserialize. That's all
pretty special to aggregates with internal states.

One other thing I can think of to mention is that the recv function
fetches the digits with pq_getmsgint(buf, sizeof(NumericDigit))
whereas, the send function sends them with pq_sendint16(buf,
var->digits[i]).  I understand you've just copied numeric_send/recv,
but I disagree with that too and think that both send functions should
be using pq_sendint.  This would save having weird issues if someone
was to change the type of the NumericDigit. Perhaps that would cause
other problems, but I don't think it's a good idea to bake those
problems in any further.

I also wonder if numericvar_recv() really needs all the validation
code?  We don't do any other validation during deserialisation. I see
the logic in doing this for a recv function since a client could send
us corrupt data e.g. during a binary copy.  There are currently no
external factors to account for with serial/deserial.

I'm also fine for that patch to go in as-is.  I'm just pointing out
what I noted down when looking over it. I'll let you choose if you
want to make any changes based on the above.

David



Re: Numeric multiplication overflow errors

From
David Rowley
Date:
 patchOn Thu, 1 Jul 2021 at 13:00, David Rowley <dgrowleyml@gmail.com> wrote:
> I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it
> all looks mostly ok.

I forgot to mention earlier, after looking at the code a bit more I
wondered if we should just serialise the NumericSumAccum instead of
the NumericVar.  If we're changing this, then maybe it's worth just
doing it once and making it as optimal as possible.

Right now we translate the NumericSumAccum to a NumericVar in the
serial function, only to translate it back again in the deserial
function. This is a bit of a waste of CPU effort, although it might be
fewer bytes to copy over to the main process.  Since deserial can be
pretty hot if we've got a lot of workers throwing serialised aggregate
states at the main process as fast as they all can go. Reducing the
overheads in the serial part of the query could provide some big wins.

I played about with the following case:

create table num (a int not null, b numeric not null, c numeric not
null, d numeric not null, e numeric not null, f numeric not null, g
numeric not null, h numeric not null);
insert into num select x,y,y,y,y,y,y,y from generate_series(1,1000000)
x, generate_Series(1000000000,1000000099) y order by x;
explain analyze select
a,sum(b),sum(c),sum(d),sum(e),sum(f),sum(g),sum(h) from num group by
a;

To try and load up the main process as much as possible, I set 128
workers to run. The profile of the main process looked like:

Master:
  14.10%  postgres            [.] AllocSetAlloc
   7.03%  postgres            [.] accum_sum_carry.part.0
   4.87%  postgres            [.] ExecInterpExpr
   3.72%  postgres            [.] numeric_sum
   3.52%  postgres            [.] accum_sum_copy
   3.06%  postgres            [.] pq_getmsgint
   2.95%  postgres            [.] palloc
   2.82%  postgres            [.] ExecStoreMinimalTuple
   2.62%  postgres            [.] accum_sum_add
   2.60%  postgres            [.] make_result_opt_error
   2.58%  postgres            [.] numeric_avg_deserialize
   2.21%  [kernel]            [k] copy_user_generic_string
   2.08%  postgres            [.] tuplehash_insert_hash_internal

So it is possible to get this stuff to show up.

Your numeric-agg-sumX2-overflow-v2.patch patch speeds this up quite a
bit already, so it makes me think it might be worth doing the extra
work to further reduce the overhead.

Master @ 3788c6678

Execution Time: 8306.319 ms
Execution Time: 8407.785 ms
Execution Time: 8491.056 ms

Master + numeric-agg-sumX2-overflow-v2.patch
Execution Time: 6633.278 ms
Execution Time: 6657.350 ms
Execution Time: 6568.184 ms

A possible reason we might not want to do this is that we currently
don't have a NumericSumAccum for some functions when the compiler has
a working int128 type.  At the moment we translate the int128
accumulator into a NumericVar. We could just serialize the int128 type
in those cases. It would just mean the serialised format is not as
consistent between different builds. We currently have nothing that
depends on them matching across different machines.

Do you think it's worth doing this?

David



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Thu, 1 Jul 2021 at 02:00, David Rowley <dgrowleyml@gmail.com> wrote:
>
> I kinda disagree with the send/recv naming since all you're using them
> for is to serialise/deserialise the NumericVar. Functions named
> *send() and recv() I more expect to return a bytea so they can be used
> for a type's send/recv function. I just don't have the same
> expectations for functions named serialize/deserialize. That's all
> pretty special to aggregates with internal states.

OK, on further reflection, I think it's best not to use the send/recv
names because those names suggest that these are the internal
implementations of the numeric_send/recv() functions, which they're
not.

> One other thing I can think of to mention is that the recv function
> fetches the digits with pq_getmsgint(buf, sizeof(NumericDigit))
> whereas, the send function sends them with pq_sendint16(buf,
> var->digits[i]).  I understand you've just copied numeric_send/recv,
> but I disagree with that too and think that both send functions should
> be using pq_sendint.

I have to disagree with that. pq_sendint() is marked as deprecated and
almost all callers of it have been removed. It looks like the original
motivation for that was performance (see 1de09ad8eb), but I prefer it
that way because it makes changing the binary format for sending data
more of a conscious choice.

That implies we should use pq_getmsgint(buf, sizeof(int16)) to read
NumericDigit's, which I've done in numericvar_deserialize(), but I've
left numeric_recv() as it is -- these 2 functions have already
diverged now, and this patch is meant to be about fixing overflow
errors, so I don't want to add more scope-creep. Perhaps a follow-on
patch could introduce pq_getmsgint8/16/32() functions, deprecate
pq_getmsgint(), and convert callers to use the new functions.

> I also wonder if numericvar_recv() really needs all the validation
> code?  We don't do any other validation during deserialisation. I see
> the logic in doing this for a recv function since a client could send
> us corrupt data e.g. during a binary copy.  There are currently no
> external factors to account for with serial/deserial.

OK, fair enough. That makes it more compact and efficient.

I'll post an update in a while. Thanks for the review.

Regards,
Dean



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Thu, 1 Jul 2021 at 10:27, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I'll post an update in a while. Thanks for the review.
>

One other thing I'm wondering about is back-patching. I was originally
thinking of these as back-patchable bug fixes, but changing the binary
format of the aggregate serialization states feels dodgy for a
back-patch.

But then, these functions are only callable in an aggregate context,
and work in pairs. It could be a problem if someone were using them to
pass state between different servers, but I see no evidence of them
being used in that way.

For reference, this will affect the following:
 - int8_avg_serialize()
 - numeric_avg_serialize()
 - numeric_poly_serialize()
 - numeric_serialize()

and the corresponding *_deserialize functions.

Regards,
Dean



Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Thu, 1 Jul 2021 at 22:03, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> One other thing I'm wondering about is back-patching. I was originally
> thinking of these as back-patchable bug fixes, but changing the binary
> format of the aggregate serialization states feels dodgy for a
> back-patch.

I was wondering about that too. I'm not sure if any extensions might
be using serial/deserial functions to communicate over multiple
servers. As far as I know, Citus does not do this and implements
aggregates like AVG(c) over multi-nodes with SUM(c) + COUNT(c). I'm
pretty sure Citus is not the only extension doing that kind of work.
So perhaps other people are using the serial/deserial functions.

David



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Thu, 1 Jul 2021 at 06:43, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Master @ 3788c6678
>
> Execution Time: 8306.319 ms
> Execution Time: 8407.785 ms
> Execution Time: 8491.056 ms
>
> Master + numeric-agg-sumX2-overflow-v2.patch
> Execution Time: 6633.278 ms
> Execution Time: 6657.350 ms
> Execution Time: 6568.184 ms
>

Hmm, I'm a bit surprised by those numbers. I wouldn't have expected it
to be spending enough time in the serialization/deserialization code
for it to make such a difference. I was only able to measure a 2-3%
performance improvement with the same test, and that was barely above
the noise.

> A possible reason we might not want to do this is that we currently
> don't have a NumericSumAccum for some functions when the compiler has
> a working int128 type.  At the moment we translate the int128
> accumulator into a NumericVar. We could just serialize the int128 type
> in those cases. It would just mean the serialised format is not as
> consistent between different builds. We currently have nothing that
> depends on them matching across different machines.
>
> Do you think it's worth doing this?
>

I think it's probably not worth doing this. I remain sceptical that it
could give that much of a performance gain, and keeping the
platform-independent state might well be useful in the future.

Regards,
Dean



Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Fri, 2 Jul 2021 at 00:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 1 Jul 2021 at 06:43, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > Master @ 3788c6678
> >
> > Execution Time: 8306.319 ms
> > Execution Time: 8407.785 ms
> > Execution Time: 8491.056 ms
> >
> > Master + numeric-agg-sumX2-overflow-v2.patch
> > Execution Time: 6633.278 ms
> > Execution Time: 6657.350 ms
> > Execution Time: 6568.184 ms
> >
>
> Hmm, I'm a bit surprised by those numbers. I wouldn't have expected it
> to be spending enough time in the serialization/deserialization code
> for it to make such a difference. I was only able to measure a 2-3%
> performance improvement with the same test, and that was barely above
> the noise.

I ran this again with a few different worker counts after tuning a few
memory settings so there was no spilling to disk and so everything was
in RAM. Mostly so I could get consistent results.

Here's the results. Average over 3 runs on each:

Workers Master Patched Percent
8            11094.1 11084.9 100.08%
16           8711.4  8562.6   101.74%
32           6961.4  6726.3  103.50%
64           6137.4  5854.8  104.83%
128         6090.3  5747.4  105.96%

So the gains are much less at lower worker counts.  I think this is
because most of the gains are in the serial part of the plan and with
higher worker counts that part of the plan is relatively much bigger.

So likely performance isn't too critical here, but it is something to
keep in mind.

David



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Fri, 2 Jul 2021 at 10:24, David Rowley <dgrowleyml@gmail.com> wrote:
>
> I ran this again with a few different worker counts after tuning a few
> memory settings so there was no spilling to disk and so everything was
> in RAM. Mostly so I could get consistent results.
>
> Here's the results. Average over 3 runs on each:
>
> Workers Master Patched Percent
> 8            11094.1 11084.9 100.08%
> 16           8711.4  8562.6   101.74%
> 32           6961.4  6726.3  103.50%
> 64           6137.4  5854.8  104.83%
> 128         6090.3  5747.4  105.96%
>

Thanks for testing again. Those are nice looking results, and are much
more in line with what I was seeing.

> So the gains are much less at lower worker counts.  I think this is
> because most of the gains are in the serial part of the plan and with
> higher worker counts that part of the plan is relatively much bigger.
>
> So likely performance isn't too critical here, but it is something to
> keep in mind.
>

Yes, agreed. I suspect there's not much more that can be shaved off
this particular piece of code now though. Here's an update with the
last set of changes discussed.

Regards,
Dean

Attachment

Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Fri, 2 Jul 2021 at 22:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Here's an update with the
> last set of changes discussed.

Looks good to me.

Just the question of if we have any problems changing the serialized
format in the back branches.  I'm not sure if that's something we've
done before. I only had a quick look of git blame in the
serial/deserial functions and the only changes I really see apart from
a few cosmetic ones were a57d312a7 and 9cca11c91. Both of which just
went into master.

David



Re: Numeric multiplication overflow errors

From
Ranier Vilela
Date:

On Fri, 2 Jul 2021 at 22:55, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> Here's an update with the
> last set of changes discussed.

If you allow me a small suggestion.
Move the initializations of the variable tmp_var to after check if the function can run.
Saves some cycles, when not running.

  /* Ensure we disallow calling when not in aggregate context */
  if (!AggCheckCallContext(fcinfo, NULL))
  elog(ERROR, "aggregate function called in non-aggregate context");

+ init_var(&tmp_var);
+

regards,
Ranier Vilela

Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Fri, 2 Jul 2021 at 12:56, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 2 Jul 2021 at 22:55, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > Here's an update with the
> > last set of changes discussed.
>
> Looks good to me.

Thanks for the review and testing!

> Just the question of if we have any problems changing the serialized
> format in the back branches.  I'm not sure if that's something we've
> done before. I only had a quick look of git blame in the
> serial/deserial functions and the only changes I really see apart from
> a few cosmetic ones were a57d312a7 and 9cca11c91. Both of which just
> went into master.

Thinking about this more, I think it's best not to risk back-patching.
It *might* be safe, but it's difficult to really be sure of that. The
bug itself is pretty unlikely to ever happen in practice, hence the
lack of prior complaints, and in fact I only found it by an
examination of the code. So it doesn't seem to be worth the risk.

OTOH, the original bug, with numeric *, is one I have hit in practice,
and the fix is trivial and low risk, so I would like to backpatch that
fix.

Regards,
Dean



Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Sat, 3 Jul 2021 at 11:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Thinking about this more, I think it's best not to risk back-patching.
> It *might* be safe, but it's difficult to really be sure of that. The
> bug itself is pretty unlikely to ever happen in practice, hence the
> lack of prior complaints, and in fact I only found it by an
> examination of the code. So it doesn't seem to be worth the risk.

That seems like good logic to me.  Perhaps we can reconsider that
decision if users complain about it.

David



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Fri, 2 Jul 2021 at 19:48, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> If you allow me a small suggestion.
> Move the initializations of the variable tmp_var to after check if the function can run.
> Saves some cycles, when not running.
>

OK, thanks. I agree, on grounds of neatness and consistency with
nearby code, so I've done it that way.

Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.

Regards,
Dean



Re: Numeric multiplication overflow errors

From
Dean Rasheed
Date:
On Sun, 4 Jul 2021 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 3 Jul 2021 at 11:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > Thinking about this more, I think it's best not to risk back-patching.
> > It *might* be safe, but it's difficult to really be sure of that. The
> > bug itself is pretty unlikely to ever happen in practice, hence the
> > lack of prior complaints, and in fact I only found it by an
> > examination of the code. So it doesn't seem to be worth the risk.
>
> That seems like good logic to me.  Perhaps we can reconsider that
> decision if users complain about it.

Thanks. Pushed to master only.

I think the other part (avoiding overflows in numeric_mul) is fairly
straightforward and uncontentious, so barring objections, I'll push
and back-patch it in a couple of days or so.

Regards,
Dean

Attachment

Re: Numeric multiplication overflow errors

From
Ranier Vilela
Date:
Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <dean.a.rasheed@gmail.com> escreveu:
On Fri, 2 Jul 2021 at 19:48, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> If you allow me a small suggestion.
> Move the initializations of the variable tmp_var to after check if the function can run.
> Saves some cycles, when not running.
>

OK, thanks. I agree, on grounds of neatness and consistency with
nearby code, so I've done it that way.
Thanks.


Note, however, that it won't make any difference to performance in the
way that you're suggesting -- elog() in Postgres is used for "should
never happen, unless there's a software bug" errors, rather than, say,
"might happen for certain invalid inputs" errors, so init_var() should
always be called in these functions.
I agree that in this case, most of the time, elog is not called.
But by writing this way, you are following the principle of not doing unnecessary work until it is absolutely necessary.
If you follow this principle, in general, the performance will always be better.

regards,
Ranier Vilela

Re: Numeric multiplication overflow errors

From
David Rowley
Date:
On Mon, 5 Jul 2021 at 23:07, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <dean.a.rasheed@gmail.com> escreveu:
>> Note, however, that it won't make any difference to performance in the
>> way that you're suggesting -- elog() in Postgres is used for "should
>> never happen, unless there's a software bug" errors, rather than, say,
>> "might happen for certain invalid inputs" errors, so init_var() should
>> always be called in these functions.
>
> I agree that in this case, most of the time, elog is not called.

You may have misunderstood what Dean meant.  elog(ERROR) calls are now
exclusively for "cannot happen" cases.  If someone gets one of these
then there's a bug to fix or something else serious has gone wrong
with the hardware.

The case you seem to be talking about would fit better if the code in
question had been ereport(ERROR).

I don't disagree that the initialisation is better to happen after the
elog. I'm just mentioning this as I wanted to make sure you knew the
difference between elog(ERROR) and ereport(ERROR).

David



Re: Numeric multiplication overflow errors

From
Ranier Vilela
Date:
Em seg., 5 de jul. de 2021 às 09:02, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 5 Jul 2021 at 23:07, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em seg., 5 de jul. de 2021 às 06:44, Dean Rasheed <dean.a.rasheed@gmail.com> escreveu:
>> Note, however, that it won't make any difference to performance in the
>> way that you're suggesting -- elog() in Postgres is used for "should
>> never happen, unless there's a software bug" errors, rather than, say,
>> "might happen for certain invalid inputs" errors, so init_var() should
>> always be called in these functions.
>
> I agree that in this case, most of the time, elog is not called.

You may have misunderstood what Dean meant.  elog(ERROR) calls are now
exclusively for "cannot happen" cases.  If someone gets one of these
then there's a bug to fix or something else serious has gone wrong
with the hardware.

The case you seem to be talking about would fit better if the code in
question had been ereport(ERROR).

I don't disagree that the initialisation is better to happen after the
elog. I'm just mentioning this as I wanted to make sure you knew the
difference between elog(ERROR) and ereport(ERROR).
I understand the difference now, thanks for clarifying.

regards,
Ranier Vilela