Thread: optimizer cost calculation problem
Kenji Sugita has identified a problem with cost_sort() in costsize.c. In the following code fragment, sortmembytes is defined as long. So double nruns = nbytes / (sortmembytes * 2); may cause an integer overflow if sortmembytes exceeds 2^30, which in turn make optimizer to produce wrong query plan(this actually happned in a large PostgreSQL installation which has tons of memory). Here is a proposed fix against current: *** optimizer/path/costsize.c 16 Feb 2003 02:30:38 -0000 1.107 --- optimizer/path/costsize.c 31 Mar 2003 22:25:06 -0000 *************** *** 490,496 **** Cost startup_cost = input_cost; Cost run_cost = 0; double nbytes = relation_byte_size(tuples,width); ! long sortmembytes = SortMem * 1024L; if (!enable_sort) startup_cost += disable_cost; --- 490,496 ---- Cost startup_cost = input_cost; Cost run_cost = 0; double nbytes = relation_byte_size(tuples,width); ! double sortmembytes = SortMem * 1024.0; if (!enable_sort) startup_cost += disable_cost;
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Kenji Sugita has identified a problem with cost_sort() in costsize.c. > In the following code fragment, sortmembytes is defined as long. So > double nruns = nbytes / (sortmembytes * 2); > may cause an integer overflow if sortmembytes exceeds 2^30, which in > turn make optimizer to produce wrong query plan(this actually happned > in a large PostgreSQL installation which has tons of memory). I find it really really hard to believe that it's wise to run with sort_mem exceeding 2 gig ;-). Does that installation have so much RAM that it can afford to run multiple many-Gb sorts concurrently? This is far from being the only place that multiplies SortMem by 1024. My inclination is that a safer fix is to alter guc.c's entry for SortMem to establish a maximum value of INT_MAX/1024 for the variable. Probably some of the other GUC variables like shared_buffers ought to have overflow-related maxima established, too. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Kenji Sugita has identified a problem with cost_sort() in costsize.c. > > In the following code fragment, sortmembytes is defined as long. So > > double nruns = nbytes / (sortmembytes * 2); > > may cause an integer overflow if sortmembytes exceeds 2^30, which in > > turn make optimizer to produce wrong query plan(this actually happned > > in a large PostgreSQL installation which has tons of memory). > > I find it really really hard to believe that it's wise to run with > sort_mem exceeding 2 gig ;-). Does that installation have so much > RAM that it can afford to run multiple many-Gb sorts concurrently? The process is assigned 1 gig sort mem to speed up a batch job by uisng backend-process-only sort mem setting, and they do not modify postgresql.conf for ordinaly user. BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes * 2) . > This is far from being the only place that multiplies SortMem by 1024. > My inclination is that a safer fix is to alter guc.c's entry for > SortMem to establish a maximum value of INT_MAX/1024 for the variable. > > Probably some of the other GUC variables like shared_buffers ought to > have overflow-related maxima established, too. > > regards, tom lane
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes * > 2) . Good point. Probably that particular calculation should be "sortmembytes * 2.0" to force it to double before it can overflow. But I still think we'd better limit SortMem so that the basic SortMem*1024 calculation can't overflow (or even come close to overflow, likely). regards, tom lane
Tom wrote: > >I find it really really hard to believe that it's wise to run with >sort_mem exceeding 2 gig ;-). Does that installation have so much >RAM that it can afford to run multiple many-Gb sorts concurrently? I don't do 2 gig... but I found 0.3 gig helped on a not-too-large system. In a nightly load of a datawarehousing application I have a number of places where I do: set sort_mem=300000; create table new_whatevers as select distinct whatever from import_table; set sort_mem=10000; and set sort_mem=100000; select count(*),b from a group by b; set sort_mem=10000; when the optimizer likes hash aggregates. It significantly (8 hours -> 6 hours) reduces the nightly processing of my log-file analysis database. If my modest system benefited from 1/3 gig, it wouldn't surprise me if a large system benefits from 2 gig. If more info's useful, I'd be happy to provide some. Ron
On Mon, 31 Mar 2003, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes * > > 2) . > > Good point. Probably that particular calculation should be > "sortmembytes * 2.0" to force it to double before it can overflow. > But I still think we'd better limit SortMem so that the basic > SortMem*1024 calculation can't overflow (or even come close to overflow, > likely). This isn't really an issue for 64 bit hardware is it?
"scott.marlowe" <scott.marlowe@ihs.com> writes: > This isn't really an issue for 64 bit hardware is it? Is "int" 64 bits on such a machine? The ones I've dealt with chose to set int = 32bits, long = 64bits. If they don't do that then they have a problem with not having any native C 32bit type (and promoting short int up to 32 bits just moves the problem...) At some point we should probably try to regularize the backend code so that all memory-size-related calcs are consistently done in size_t or ssize_t arithmetic; but we're a long way from that at present. For now I think it's prudent to keep sort_mem small enough to avoid overflow in int32 arithmetic. regards, tom lane
On Tue, Apr 01, 2003 at 11:24:01AM -0500, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > This isn't really an issue for 64 bit hardware is it? > > Is "int" 64 bits on such a machine? The ones I've dealt with chose to It was for the regression tests we did recently on a 64-bit gcc-3.2-compiled Solaris 8 binary. Piles of regression failures because int was not 32 bits. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110