Re: [HACKERS] multivariate statistics (v25) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] multivariate statistics (v25)
Date
Msg-id 20170316043651.ncca27wsikoxuhc6@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] multivariate statistics (v25)  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley wrote:

> + k = -1;
> + while ((k = bms_next_member(attnums, k)) >= 0)
> + {
> + bool attr_found = false;
> + for (i = 0; i < info->stakeys->dim1; i++)
> + {
> + if (info->stakeys->values[i] == k)
> + {
> + attr_found = true;
> + break;
> + }
> + }
> +
> + /* found attribute not covered by this ndistinct stats, skip */
> + if (!attr_found)
> + {
> + matches = false;
> + break;
> + }
> + }
> 
> Would it be better just to stuff info->stakeys->values into a bitmapset and
> check its a subset of attnums? It would mean allocating memory in the loop,
> so maybe you think otherwise, but in that case maybe StatisticExtInfo
> should store the bitmapset?

Yeah, I think StatisticExtInfo should have a bitmapset, not an
int2vector.

> + appendPQExpBuffer(&buf, "(dependencies)");
> 
> I think it's better practice to use appendPQExpBufferStr() when there's no
> formatting. It'll perform marginally better, which might not be important
> here, but it sets a better example for people to follow when performance is
> more critical.

FWIW this should have said "(ndistinct)" anyway :-)

> +   change the definition of a extended statistics
> 
> "a" should be "an", Also is statistics plural here. It's commonly mixed up
> in the patch. I think it needs standardised. I personally think if you're
> speaking of a single pg_statatic_ext row, then it should be singular. Yet,
> I'm aware you're using plural for the CREATE STATISTICS command, to me that
> feels a bit like: CREATE TABLES mytable ();  am I somehow thinking wrongly
> somehow here?

This was discussed upthread as I recall.  This is what Merriam-Webster says on
the topic:

statistic
1   :  a single term or datum in a collection of statistics
2 a :  a quantity (as the mean of a sample) that is computed from a sample;      specifically :  estimate 3b b :  a
randomvariable that takes on the possible values of a statistic
 

statistics
1   :  a branch of mathematics dealing with the collection, analysis,      interpretation, and presentation of masses
ofnumerical data
 
2   :  a collection of quantitative data

Now, I think there's room to say that a single object created by the new CREATE
STATISTICS is really the latter, not the former.  I find it very weird
that a single of these objects is named in the plural form, though, and
it looks odd all over the place.  I would rather use the term
"statistics object", and then we can continue using the singular.

> +   If a schema name is given (for example, <literal>CREATE STATISTICS
> +   myschema.mystat ...</>) then the statistics is created in the specified
> +   schema.  Otherwise it is created in the current schema.  The name of
> 
> What's created in the current schema? I thought this was just for naming?

Well, "created in a schema" means that the object is named after that
schema.  So both are the same thing.  Is this unclear in some way?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitraryvacuum flags
Next
From: Jim Nasby
Date:
Subject: [HACKERS] Split conditions on relations