Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 5F255F0E-3F31-4753-87D7-3C4768A2B967@gmail.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
> On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> Attached is an updated version of the patch, adopting the psql describe
> changes introduced by 471d55859c11b.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> <0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>

Hello Tomas,

After applying both your patches, I get a warning:

histogram.c:1284:10: warning: taking the absolute value of unsigned type 'uint32' (aka 'unsigned int') has no effect
[-Wabsolute-value]      delta = fabs(data->numrows);               ^ 
histogram.c:1284:10: note: remove the call to 'fabs' since unsigned values cannot be negative       delta =
fabs(data->numrows);              ^~~~ 
1 warning generated.


Looking closer at this section, there is some odd integer vs. floating point arithmetic happening
that is not necessarily wrong, but might be needlessly inefficient:
   delta = fabs(data->numrows);   split_value = values[0].value;
   for (i = 1; i < data->numrows; i++)   {       if (values[i].value != values[i - 1].value)       {           /* are
wecloser to splitting the bucket in half? */           if (fabs(i - data->numrows / 2.0) < delta)           {
   /* let's assume we'll use this value for the split */               split_value = values[i].value;
delta= fabs(i - data->numrows / 2.0);               nrows = i;           }       }   } 

I'm not sure the compiler will be able to optimize out the recomputation of data->numrows / 2.0
each time through the loop, since the compiler might not be able to prove to itself that data->numrows
does not get changed.  Perhaps you should compute it just once prior to entering the outer loop,
store it in a variable of integer type, round 'delta' off and store in an integer, and do integer comparisons
within the loop?  Just a thought....


mark

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Code cleanup patch submission for extended_stats.c
Next
From: Mark Dilger
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists