Thread: optimizer cost calculation problem

optimizer cost calculation problem

From

Tatsuo Ishii

Date:

31 March 2003, 17:36:30

Kenji Sugita has identified a problem with cost_sort() in costsize.c.

In the following code fragment, sortmembytes is defined as long. So
    double        nruns = nbytes / (sortmembytes * 2);

may cause an integer overflow if sortmembytes exceeds 2^30, which in
turn make optimizer to produce wrong query plan(this actually happned
in a large PostgreSQL installation which has tons of memory). Here is a
proposed fix against current:

*** optimizer/path/costsize.c    16 Feb 2003 02:30:38 -0000    1.107
--- optimizer/path/costsize.c    31 Mar 2003 22:25:06 -0000
***************
*** 490,496 ****     Cost        startup_cost = input_cost;     Cost        run_cost = 0;     double        nbytes =
relation_byte_size(tuples,width);
 
!     long        sortmembytes = SortMem * 1024L;      if (!enable_sort)         startup_cost += disable_cost;
--- 490,496 ----     Cost        startup_cost = input_cost;     Cost        run_cost = 0;     double        nbytes =
relation_byte_size(tuples,width);
 
!     double        sortmembytes = SortMem * 1024.0;      if (!enable_sort)         startup_cost += disable_cost;

Re: optimizer cost calculation problem

From

Tom Lane

Date:

31 March 2003, 18:04:47

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Kenji Sugita has identified a problem with cost_sort() in costsize.c.
> In the following code fragment, sortmembytes is defined as long. So
>         double        nruns = nbytes / (sortmembytes * 2);
> may cause an integer overflow if sortmembytes exceeds 2^30, which in
> turn make optimizer to produce wrong query plan(this actually happned
> in a large PostgreSQL installation which has tons of memory).

I find it really really hard to believe that it's wise to run with
sort_mem exceeding 2 gig ;-).  Does that installation have so much
RAM that it can afford to run multiple many-Gb sorts concurrently?

This is far from being the only place that multiplies SortMem by 1024.
My inclination is that a safer fix is to alter guc.c's entry for
SortMem to establish a maximum value of INT_MAX/1024 for the variable.

Probably some of the other GUC variables like shared_buffers ought to
have overflow-related maxima established, too.
        regards, tom lane

Re: optimizer cost calculation problem

From

Tatsuo Ishii

Date:

31 March 2003, 19:12:48

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > Kenji Sugita has identified a problem with cost_sort() in costsize.c.
> > In the following code fragment, sortmembytes is defined as long. So
> >         double        nruns = nbytes / (sortmembytes * 2);
> > may cause an integer overflow if sortmembytes exceeds 2^30, which in
> > turn make optimizer to produce wrong query plan(this actually happned
> > in a large PostgreSQL installation which has tons of memory).
> 
> I find it really really hard to believe that it's wise to run with
> sort_mem exceeding 2 gig ;-).  Does that installation have so much
> RAM that it can afford to run multiple many-Gb sorts concurrently?

The process is assigned 1 gig sort mem to speed up a batch job by
uisng backend-process-only sort mem setting, and they do not modify
postgresql.conf for ordinaly user.

BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
2) . 

> This is far from being the only place that multiplies SortMem by 1024.
> My inclination is that a safer fix is to alter guc.c's entry for
> SortMem to establish a maximum value of INT_MAX/1024 for the variable.
> 
> Probably some of the other GUC variables like shared_buffers ought to
> have overflow-related maxima established, too.
> 
>             regards, tom lane

Re: optimizer cost calculation problem

From

Tom Lane

Date:

31 March 2003, 20:22:55

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
> 2) . 

Good point.  Probably that particular calculation should be
"sortmembytes * 2.0" to force it to double before it can overflow.
But I still think we'd better limit SortMem so that the basic
SortMem*1024 calculation can't overflow (or even come close to overflow,
likely).
        regards, tom lane

Re: optimizer cost calculation problem

From

"Ron Mayer"

Date:

31 March 2003, 21:15:05

Tom wrote:
>
>I find it really really hard to believe that it's wise to run with
>sort_mem exceeding 2 gig ;-).  Does that installation have so much
>RAM that it can afford to run multiple many-Gb sorts concurrently?

I don't do 2 gig... but I found 0.3 gig helped on a not-too-large system.

In a nightly load of a datawarehousing application I have
a number of places where I do:
 set sort_mem=300000; create table new_whatevers as select distinct whatever from import_table; set sort_mem=10000;

and
 set sort_mem=100000; select count(*),b from a group by b; set sort_mem=10000;

when the optimizer likes hash aggregates.

It significantly (8 hours -> 6 hours) reduces the nightly processing
of my log-file analysis database.  If my modest system benefited
from 1/3 gig, it wouldn't surprise me if a large system benefits
from 2 gig.

If more info's useful, I'd be happy to provide some.
  Ron

Re: optimizer cost calculation problem

From

"scott.marlowe"

Date:

01 April 2003, 11:14:21

On Mon, 31 Mar 2003, Tom Lane wrote:

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
> > 2) . 
> 
> Good point.  Probably that particular calculation should be
> "sortmembytes * 2.0" to force it to double before it can overflow.
> But I still think we'd better limit SortMem so that the basic
> SortMem*1024 calculation can't overflow (or even come close to overflow,
> likely).

This isn't really an issue for 64 bit hardware is it?

Re: optimizer cost calculation problem

From

Tom Lane

Date:

01 April 2003, 11:24:05

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> This isn't really an issue for 64 bit hardware is it?

Is "int" 64 bits on such a machine?  The ones I've dealt with chose to
set int = 32bits, long = 64bits.  If they don't do that then they have
a problem with not having any native C 32bit type (and promoting short
int up to 32 bits just moves the problem...)

At some point we should probably try to regularize the backend code so
that all memory-size-related calcs are consistently done in size_t or
ssize_t arithmetic; but we're a long way from that at present.  For now
I think it's prudent to keep sort_mem small enough to avoid overflow in
int32 arithmetic.
        regards, tom lane

Re: optimizer cost calculation problem

From

Andrew Sullivan

Date:

02 April 2003, 14:02:32

On Tue, Apr 01, 2003 at 11:24:01AM -0500, Tom Lane wrote:
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > This isn't really an issue for 64 bit hardware is it?
> 
> Is "int" 64 bits on such a machine?  The ones I've dealt with chose to

It was for the regression tests we did recently on a 64-bit
gcc-3.2-compiled Solaris 8 binary.  Piles of regression failures
because int was not 32 bits.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8                                        +1 416 646 3304
x110